<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2017.01272</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Utturkar</surname> <given-names>Sagar M.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="author-notes" rid="fn003"><sup>&#x02020;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/281979/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Klingeman</surname> <given-names>Dawn M.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Hurt</surname> <given-names>Richard A.</given-names> <suffix>Jr.</suffix></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/34738/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Brown</surname> <given-names>Steven D.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/25862/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Graduate School of Genome Science and Technology, University of Tennessee</institution> <country>Knoxville, TN, United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Biosciences Division, Oak Ridge National Laboratory</institution> <country>Oak Ridge, TN, United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>BioEnergy Science Center</institution> <country>Oak Ridge, TN, United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Angel Angelov, Technische Universit&#x000E4;t M&#x000FC;nchen, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Gwenael Piganeau, FR3724 Observatoire Oc&#x000E9;anologique de Banyuls sur Mer (OOB), France; Hilary G. Morrison, Marine Biological Laboratory, United States</p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x0002A;Correspondence: Steven D. Brown <email>brownsd&#x00040;ornl.gov</email></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology</p></fn>
<fn fn-type="present-address" id="fn003"><p>&#x02020;Present Address: Sagar M. Utturkar, Bioinformatics Core, Purdue University, West Lafayette, IN, United States</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>18</day>
<month>07</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>8</volume>
<elocation-id>1272</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>05</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>06</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2017 Utturkar, Klingeman, Hurt and Brown.</copyright-statement>
<copyright-year>2017</copyright-year>
<copyright-holder>Utturkar, Klingeman, Hurt and Brown</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.</p></abstract>
<kwd-group>
<kwd>PacBio</kwd>
<kwd>Illumina</kwd>
<kwd>genome assembly</kwd>
<kwd>next-generation sequencing (NGS)</kwd>
<kwd>repetitive DNA</kwd>
<kwd>Pilon</kwd>
<kwd>circlator</kwd>
</kwd-group>
<contract-sponsor id="cn001">U.S. Department of Energy<named-content content-type="fundref-id">10.13039/100000015</named-content></contract-sponsor>
<counts>
<fig-count count="2"/>
<table-count count="4"/>
<equation-count count="0"/>
<ref-count count="72"/>
<page-count count="11"/>
<word-count count="8234"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Since the first Next-Generation Sequencing (NGS) platform was released by 454 Life science (Margulies et al., <xref ref-type="bibr" rid="B38">2005</xref>), there has been a remarkable increase in sequencing efficiency, throughput, and read lengths (Koren and Phillippy, <xref ref-type="bibr" rid="B29">2014</xref>). Sequencing costs continue to drop dramatically and whole genome sequencing is within reach for small-scale laboratories on relatively modest budgets. During the past decade, the sequencing industry has been largely dominated by the second generation, sequencing by synthesis platforms such as Illumina which are characterized by the low-cost, high-throughput, and short reads with high accuracy (van Dijk et al., <xref ref-type="bibr" rid="B69">2014</xref>). Short sequencing reads have limited power to resolve large repetitive regions even within small microbial genomes (Chain et al., <xref ref-type="bibr" rid="B8">2009</xref>; Nagarajan and Pop, <xref ref-type="bibr" rid="B41">2013</xref>). Short read technologies are generally able to resolve microbial genomes up to the high-quality draft standard (Treangen and Salzberg, <xref ref-type="bibr" rid="B65">2012</xref>), which is sufficient for many applications such as understanding gene-coding potential, strain typing, or pan-genome analysis (Roberts et al., <xref ref-type="bibr" rid="B55">2013</xref>). However, draft genomes are fragmented assemblies that can contain misassembled regions, incorrect gene calls, and other artifacts. Fragmented assemblies are often attributed to repetitive DNA regions (such as rRNA operons) which are abundant in microbial genomes and present the greatest technical challenge to the assembly process especially when the repetitive region is longer than the read lengths (Treangen and Salzberg, <xref ref-type="bibr" rid="B65">2012</xref>; Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>). Finished genome sequences are high quality by definition, represent more accurate genomic information and are often desirable for model organisms and industrially important microbes (Fraser et al., <xref ref-type="bibr" rid="B18">2002</xref>; Thomma et al., <xref ref-type="bibr" rid="B63">2015</xref>).</p>
<p>The application of new protocols (e.g., use of complementary paired and mate-pair libraries) and algorithm developments have facilitated improved genome assemblies. Progress in next-generation sequencing platforms, metrics, and performances has been reviewed (Liu et al., <xref ref-type="bibr" rid="B36">2012</xref>; Quail et al., <xref ref-type="bibr" rid="B51">2012</xref>; van Dijk et al., <xref ref-type="bibr" rid="B69">2014</xref>), assessments for various assembly methods conducted (Salzberg et al., <xref ref-type="bibr" rid="B57">2012</xref>; Magoc et al., <xref ref-type="bibr" rid="B37">2013</xref>; Koren and Phillippy, <xref ref-type="bibr" rid="B29">2014</xref>; Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>), and various applications (Buermans and den Dunnen, <xref ref-type="bibr" rid="B7">2014</xref>; Rhoads and Au, <xref ref-type="bibr" rid="B53">2015</xref>) have been discussed elsewhere. Development of so-called third-generation sequencing platforms for single-molecule sequencing is a more recent development for producing long sequence reads which facilitate assembly. Pacific Biosciences (PacBio) RS-II instrument outputs are characterized by long reads and average read lengths are reported in the range of 10&#x02013;11 kb (Hua and Hua, <xref ref-type="bibr" rid="B23">2016</xref>). Relatively high rate of random errors within individual reads can be overcome by error-correction algorithms given sufficient sequencing depth (Chin et al., <xref ref-type="bibr" rid="B9">2013</xref>). The longest reported PacBio reads from the RS-II instrument extend well beyond 20 kb. A key aspect of longer reads is their ability to span large repetitive regions, which greatly aids the assembly process (Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>; Koren and Phillippy, <xref ref-type="bibr" rid="B29">2014</xref>; Utturkar et al., <xref ref-type="bibr" rid="B67">2015</xref>) when sufficient coverage (&#x0003E;100<sc>x</sc>) is available (Chin et al., <xref ref-type="bibr" rid="B9">2013</xref>; Koren et al., <xref ref-type="bibr" rid="B30">2013</xref>). In 2014, Oxford Nanopore Technologies released a nanopore-based sequencer for long single molecule DNA reads (Feng et al., <xref ref-type="bibr" rid="B17">2015</xref>). In the time since its release, hybrid and <italic>de novo</italic> assembly strategies have also been developed and tested using Oxford Nanopore datasets (Risse et al., <xref ref-type="bibr" rid="B54">2015</xref>; Deschamps et al., <xref ref-type="bibr" rid="B13">2016</xref>).</p>
<p>The application of longer sequencing reads facilitated finished genome assemblies for many bacterial genomes (Koren et al., <xref ref-type="bibr" rid="B30">2013</xref>). The utility of long reads is demonstrated by the increasing number of finished genomes obtained using PacBio technology (Koren et al., <xref ref-type="bibr" rid="B30">2013</xref>; Brown S. D. et al., <xref ref-type="bibr" rid="B5">2014</xref>; Eckweiler et al., <xref ref-type="bibr" rid="B15">2014</xref>; Harhay et al., <xref ref-type="bibr" rid="B20">2014</xref>; Mehnaz et al., <xref ref-type="bibr" rid="B39">2014</xref>; Satou et al., <xref ref-type="bibr" rid="B58">2014</xref>; Kanda et al., <xref ref-type="bibr" rid="B27">2015</xref>; Nakano et al., <xref ref-type="bibr" rid="B42">2015</xref>). However, examples exist where genomes are only resolved into 10 or fewer contigs despite high (&#x0003E;100<sc>x</sc>) PacBio sequence coverage (Hoefler et al., <xref ref-type="bibr" rid="B22">2013</xref>; Dunitz et al., <xref ref-type="bibr" rid="B14">2014</xref>; Bishnoi et al., <xref ref-type="bibr" rid="B4">2015</xref>; Okutani et al., <xref ref-type="bibr" rid="B44">2015</xref>; Shapiro et al., <xref ref-type="bibr" rid="B59">2015</xref>; The NCTC 3000 Project, <xref ref-type="bibr" rid="B62">2016</xref>), and manual finishing is necessary to obtain complete genome sequences. Substantial developments for long read assembly methods and analysis are reported, but information is lacking on the nature of unassembled DNA regions or gaps within unfinished PacBio assemblies. Therefore, a systematic evaluation of draft, near-finished (containing up to 10 contigs) and finished genome assemblies would be useful to reveal the features and properties of the unassembled DNA regions from Illumina and/or PacBio platforms.</p>
<p>In the present study, seven bacterial genomes were sequenced using Illumina Paired-End (PE) and PacBio RS-II platforms. <italic>De novo</italic> and hybrid genome assemblies were created using platform specific or hybrid datasets from Illumina and PacBio platforms with various assembly programs and parameter optimizations. In this focused study, manual genome finishing was performed for two genomes, generating up to finished grade assemblies and permitted further analysis of prior gap sequences for which there is a dearth of data. Additional genome polishing was performed on PacBio assemblies with the recently described Pilon software (Walker et al., <xref ref-type="bibr" rid="B70">2014</xref>). The impact of improving genome assemblies and polishing was assessed by several metrics that included gene models. This study offers insights into the nature of gaps associated with Illumina and PacBio assemblies of microbial genomes, describes bioinformatics and manual steps for assembly improvement and underlines the importance of post-assembly polishing steps for genome refinement.</p>
</sec>
<sec sec-type="materials and methods" id="s2">
<title>Materials and methods</title>
<sec>
<title>Whole genome sequencing</title>
<p>Whole genome sequencing data for seven microorganisms (<italic>Clostridium pasteurianum</italic> ATCC 6013 (Pyne et al., <xref ref-type="bibr" rid="B50">2014</xref>), <italic>Clostridium paradoxum</italic> JW-YL-7 (Lancaster et al., <xref ref-type="bibr" rid="B34">2016</xref>), <italic>Clostridium thermocellum</italic> AD2 (Utturkar et al., <xref ref-type="bibr" rid="B66">2016</xref>), <italic>Pelosinus fermentans</italic> UFO1 (Brown S. D. et al., <xref ref-type="bibr" rid="B5">2014</xref>), <italic>P. fermentans</italic> JBW45 (De Leon et al., <xref ref-type="bibr" rid="B12">2015</xref>), <italic>Halomonas</italic> sp. KO116 (O&#x00027;Dell et al., <xref ref-type="bibr" rid="B43">2015</xref>) and <italic>Bacteroides cellulosolvens</italic> DSM 2933) (Dassa et al., <xref ref-type="bibr" rid="B11">2015</xref>) using Illumina MiSeq (Illumina, San Diego, CA, USA) (Quail et al., <xref ref-type="bibr" rid="B51">2012</xref>) and PacBio RS-II (Pacific Biosciences, Menlo Park, CA, USA) (Korlach et al., <xref ref-type="bibr" rid="B32">2010</xref>) platforms have been reported. The bacteria were chosen for the availability of Illumina and PacBio sequence data, with most having relevance to bioenergy applications, and in the case of <italic>P. fermentans</italic> species they are fermentative metal-reducing bacteria. For all genomes in current study, Illumina paired-end library preparation, PacBio SMRTbell library preparation, and sequencing protocols are performed as described previously (Utturkar et al., <xref ref-type="bibr" rid="B67">2015</xref>). GenBank and SRA sequence accession numbers for each genome are provided in Table <xref ref-type="supplementary-material" rid="SM2">S1</xref>.</p>
</sec>
<sec>
<title>Data quality control, genome assembly, and annotation</title>
<p>Quality based trimming of raw Illumina data was performed using CLC Genomics workbench software (CLC) to remove bases having PHRED quality score &#x0003C;30 and any reads shorter than 20 bp. Adapter trimming and filtering of raw PacBio data was performed through SMRT analysis software to obtain &#x0201C;filtered subreads&#x0201D; with default parameters (Utturkar et al., <xref ref-type="bibr" rid="B67">2015</xref>). <italic>De novo</italic> genome assembly of Illumina data was performed using SPAdes version 3.5.0 (Bankevich et al., <xref ref-type="bibr" rid="B2">2012</xref>) and ABySS version 1.5.2 (Simpson et al., <xref ref-type="bibr" rid="B60">2009</xref>) with parameter optimization (Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>). Hybrid assembly of Illumina and PacBio data was performed using SPAdes hybrid assembler version 3.5.0 with default parameters. Exact commands used for SPAdes and ABySS assemblies are provided in Section <xref ref-type="supplementary-material" rid="SM1">S1</xref>. Long read PacBio data were assembled using the SMRT Analysis software and the HGAP protocol (Chin et al., <xref ref-type="bibr" rid="B9">2013</xref>). In the HGAP protocol, the &#x0201C;Target Coverage&#x0201D; parameter was updated to 15X as recommended for microbial genomes (Pacific-Biosciences, <xref ref-type="bibr" rid="B46">2014a</xref>). The specific versions of SMRT Analysis software used for each genome are provided in the results section. Assembly summary statistics were determined using Quast software version 2.3 (Gurevich et al., <xref ref-type="bibr" rid="B19">2013</xref>). Gene-calling and genome annotation were performed through the Prodigal algorithm and microbial genome annotation pipeline at Oak Ridge National Laboratory (Hyatt et al., <xref ref-type="bibr" rid="B25">2010</xref>; Woo et al., <xref ref-type="bibr" rid="B71">2014</xref>).</p>
</sec>
<sec>
<title>Manual genome finishing</title>
<p>Manual genome finishing was performed using bioinformatics tools and PCR/Sanger sequencing. During bioinformatics steps, contigs from different draft and hybrid genome assemblies were mapped to PacBio-only assemblies using Geneious software version 8.1.6 (Biomatters, Auckland, New Zealand) (Kearse et al., <xref ref-type="bibr" rid="B28">2012</xref>) with default parameters. Mapping results were manually inspected to identify a possible extension (or overhang) relative to reference contigs. Supported extensions were added to the reference contigs and assembly of contigs (super-assembly) was created through Geneious software to derive a longer consensus sequence. See Section <xref ref-type="supplementary-material" rid="SM1">S1</xref> for details of the Geneious software modules used in each step. Bioinformatically derived contig extensions and super-assembly derived consensus sequences were verified by PCR and Sanger sequencing. Bioinformatics finishing steps, designing of PCR/Sanger sequencing based validations and various experimental modifications of standard PCR protocol are described in detail in Section <xref ref-type="supplementary-material" rid="SM1">S1</xref> with examples of two manually finished genomes (Figures <xref ref-type="supplementary-material" rid="SM1">S1</xref>, <xref ref-type="supplementary-material" rid="SM1">S2</xref>, and <xref ref-type="supplementary-material" rid="SM1">S3</xref>).</p>
</sec>
<sec>
<title>Analysis of unassembled (gap) DNA</title>
<p>Mapping of Illumina draft contigs to finished/near-finished assemblies was performed using the &#x0201C;Map to Reference&#x0201D; module in the Geneious software, followed by manual inspection to reveal Illumina gaps and associated annotations. PacBio gaps were revealed through manual finishing of two genomes and the resulting sequences were submitted to the mfold web server (Zuker, <xref ref-type="bibr" rid="B72">2003</xref>) to determine DNA folding properties and secondary structures. Default DNA folding parameters in mfold software were modified to mimic the PCR conditions (folding temperature = 55<sup>0</sup> C, [Na<sup>&#x0002B;</sup>] concentration = 50 mM, [Mg<sup>&#x0002B;&#x0002B;</sup>] concentration = 2.5 mM). Positional preference was determined using PerPlot and PerScan tools (Mrazek et al., <xref ref-type="bibr" rid="B40">2011</xref>) with default parameters, and genes with periodicity intensity cutoff higher than 2.5 were determined.</p>
</sec>
<sec>
<title>Post-assembly polishing and validation steps</title>
<p>PacBio-only assemblies were polished by running one additional round of the Quiver algorithm (Chin et al., <xref ref-type="bibr" rid="B9">2013</xref>), followed by basecall correction through Pilon software (Walker et al., <xref ref-type="bibr" rid="B70">2014</xref>) (version 1.13) with default parameters. Quiver uses PacBio reads while Pilon uses Illumina reads to perform base corrections and derive an accurate consensus sequence. The circular nature of HGAP derived contigs was assessed via the dot-plotting tool Gepard (Krumsiek et al., <xref ref-type="bibr" rid="B33">2007</xref>) and circular genome sequences were derived through an alignment approach described in PacBio training manual (Pacific-Biosciences, <xref ref-type="bibr" rid="B48">2015</xref>). The presence of non-chromosomal DNA such as a plasmid or phage-DNA elements was tested by evaluation of any singleton sequences and/or &#x0201C;deg.fasta&#x0201D; files (which may contain high copy number sequences such as plasmids or phage DNA) generated during the HGAP protocol. For assemblies containing fewer than 5 contigs, each contig was individually tested for circularity. The presence of plasmid DNA was further analyzed by searching for the annotated plasmid related genes such as &#x0201C;RepA&#x02014;plasmid replication protein.&#x0201D; Additionally, DNA base modification analysis was performed for complete genomes using SMRT analysis software and methylation profiles (Pacific-BioSciences, <xref ref-type="bibr" rid="B47">2014b</xref>) were determined for incorporation into the REBASE database (Roberts et al., <xref ref-type="bibr" rid="B56">2015</xref>). REBASE is a database for information on recognition and cleavage sites for both restriction enzymes and methyltransferases and methylation sensitivity. PacBio data generates data on modified bases, which may be useful for related studies. Pilon corrections and comparison of Illumina and PacBio assemblies were further assessed by measuring the impact of nucleotide changes on protein coding potential and positive/negative influence on gene calling accuracy (See Section <xref ref-type="supplementary-material" rid="SM1">S1</xref> for details).</p>
</sec>
</sec>
<sec id="s3">
<title>Results and discussion</title>
<sec>
<title>Sequencing and assembly overview</title>
<p>Illumina sequence coverage for each genome is &#x0003E;200X, sufficient to derive high-quality draft genome assemblies (Haridas et al., <xref ref-type="bibr" rid="B21">2011</xref>; Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>). PacBio sequence coverage for each genome is &#x0003E;100X except for the isolates of <italic>Pelosinus</italic> sp. UFO1 (97x) and <italic>B. cellulosolvens</italic> DSM 2933 (48X). Post-trimming and filtering statistics for Illumina and PacBio data including the number of reads, average read lengths and genome coverage and total bases are summarized in Tables <xref ref-type="supplementary-material" rid="SM2">S1</xref>, <xref ref-type="supplementary-material" rid="SM2">S2</xref>, respectively. Genome assemblies were performed using combinations of Illumina and PacBio platforms and various assembly programs. Consistent with previous results (Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>), most of the genomes in the current study have superior PacBio-only assemblies (based on assembly statistics) followed by hybrid and Illumina-only assemblies, respectively. Out of seven genomes, three were assembled as complete circular chromosomes, manual finishing was performed for two genomes and remaining two were reported as near-finished assemblies. Details of the assembly results and manual finishing approaches are described in later sections. Using these seven genomes as a case study, we describe the best practices to obtain high-quality genome assembly using long sequence reads, post-assembly polishing steps, and gap-closure strategies for automated near-finished assemblies. The finishing approach outlined in this study includes the use of super-assemblies and supporting Illumina data to determine contig order followed by PCR and Sanger sequencing to validate contig joining. Post-finishing data were used to determine the characteristics of the unassembled DNA regions within Illumina and PacBio assembly.</p>
</sec>
<sec>
<title>Unassembled DNA regions in PacBio-only assemblies</title>
<p>Inspection of unassembled DNA regions within PacBio assemblies was performed using five gap sequences generated through manual finishing of <italic>C. thermocellum</italic> AD2 and <italic>B. cellulosolvens</italic> DSM 2933 genomes. The unassembled DNA from PacBio assemblies were analyzed for GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations (Table <xref ref-type="table" rid="T1">1</xref>). GC content of gap sequences does not diverge markedly from the genome sequence. Four of five PacBio gaps were associated with lower than the recommended coverage for HGAP assembly (100x). Gaps AD2_overlap1, AD2_Gap1 and BC_Gap1 were the most difficult to resolve by PCR and Sanger sequencing and had low sequence coverages (36X, 82X, and 4X, respectively), while high average sequence coverage values were present across the genomes (see Section <xref ref-type="supplementary-material" rid="SM1">S1</xref> for details).</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Properties of gap sequences present within PacBio assembly.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Organism</bold></th>
<th valign="top" align="left"><bold>Region name</bold></th>
<th valign="top" align="center"><bold>Start</bold></th>
<th valign="top" align="center"><bold>Stop</bold></th>
<th valign="top" align="center"><bold>Length (bp)</bold></th>
<th valign="top" align="center"><bold>PacBio read coverage</bold></th>
<th valign="top" align="center"><bold>%GC</bold></th>
<th valign="top" align="left"><bold>Corresponding annotation</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>Clostridium thermocellum</italic> AD2</td>
<td valign="top" align="left">AD2_Overlap1</td>
<td valign="top" align="center">3,502</td>
<td valign="top" align="center">5,535</td>
<td valign="top" align="center">2033</td>
<td valign="top" align="center">36x</td>
<td valign="top" align="center">39.4</td>
<td valign="top" align="left">Membrane protein insertase</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AD2_Overlap2</td>
<td valign="top" align="center">180,557</td>
<td valign="top" align="center">182,612</td>
<td valign="top" align="center">2055</td>
<td valign="top" align="center">116x</td>
<td valign="top" align="center">35.1</td>
<td valign="top" align="left">Transposase DDE domain</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">AD2_Gap1</td>
<td valign="top" align="center">558,824</td>
<td valign="top" align="center">559,892</td>
<td valign="top" align="center">1068</td>
<td valign="top" align="center">82x</td>
<td valign="top" align="center">39</td>
<td valign="top" align="left">Transposase mutator type</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Bacteroides cellulosolvens</italic> DSM 2933</td>
<td valign="top" align="left">BC_Overlap1</td>
<td valign="top" align="center">6,343,204</td>
<td valign="top" align="center">6,349,991</td>
<td valign="top" align="center">6788</td>
<td valign="top" align="center">36x</td>
<td valign="top" align="center">32.5</td>
<td valign="top" align="left">Transposase Tn3 family protein</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">BC_Gap1</td>
<td valign="top" align="center">6,389,652</td>
<td valign="top" align="center">6,390,057</td>
<td valign="top" align="center">405</td>
<td valign="top" align="center">4x</td>
<td valign="top" align="center">35.5</td>
<td valign="top" align="left">RNA-binding protein</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Average sequence coverage and GC contents of the final genome assembly are provided in the Table <xref ref-type="supplementary-material" rid="SM2">S2</xref></italic>.</p>
</table-wrap-foot>
</table-wrap>
<p>Considering the low sequence coverage values and challenges associated with PCR amplification for the closed gap sequences, we hypothesized that PacBio gap sequences might form strong hairpin loop structures that would prevent DNA polymerase from being able to unwind and extend through the DNA region. To test our hypothesis, structural properties of gap sequences were analyzed using the mfold web server, which predicts the secondary structures or ability to form hairpin loops and associated minimum free energy (&#x00394;G) values. Mfold analysis of PacBio gap sequences revealed the potential to form small stem-loop structures but large and/or strong secondary structural loops that might interfere with DNA polymerase and result in low sequence coverage were not identified. Significant differences were not observed between minimum free energies and secondary structures of PacBio gaps and 20 randomly selected regions from the AD2 and DSM 2933 genomes (Table <xref ref-type="supplementary-material" rid="SM2">S3</xref>). In addition, we utilized DNA periodicity criteria to determine any associations between PacBio gaps and other structural features of DNA. Regular spacing of short runs of A or T nucleotides with DNA helical period of &#x0007E;10.5 bp (termed as a positional preference) has been associated with DNA curvature, supercoiling and nucleosome positioning. Relatively rigid sections of the prokaryotic DNA (characterized by short intrinsically bent DNA segments) are proposed to be associated with strong periodic patterns while structurally flexible regions are associated with weak periods (Mrazek et al., <xref ref-type="bibr" rid="B40">2011</xref>; Tong and Mrazek, <xref ref-type="bibr" rid="B64">2014</xref>). Positional preference was determined for all the genomes in current study and regions which correspond to Illumina gaps and also have positional preference higher than 2.50 are highlighted in orange color (Tables <xref ref-type="supplementary-material" rid="SM2">S4</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM2">S9</xref>). However, gaps appear to be randomly distributed as compared to strong/weak positional preference and a specific trend was not observed for this metric. An example of positional preference locations and Illumina/PacBio gaps in AD2 genome is presented (Figure <xref ref-type="fig" rid="F1">1</xref>). Post-finishing, we determined Illumina reads have uniform coverage across the gap regions. However, short read length and repetitive nature of these regions may have prevented the accurate assembly. Therefore, our initial hypothesis that resilient PacBio gaps resulted from the inability of DNA polymerase to sequence through strong hairpin loop structures was rejected.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>AD2 genome assembly comparisons. The outermost orange colored circle corresponds to finished genome assembly. The next two circles show genes on positive and negative strands and using color coded by standards for COG categories. The next yellow colored circle corresponds to Illumina assembly and gaps within Illumina assembly are denoted by red strokes. The next circle denotes the strong positional preference marked in pink color. The next two concentric circles denote the sequence coverage for Illumina and PacBio technologies respectively as heatmap (lowest: light blue, highest: dark blue). The innermost circle: AD2_SC1 (yellow) was generated by super assembly of draft contigs (green). AD2_HC1 (sky blue) share 780 kb overlap with AD2_SC1. Blue-highlighted region denotes sequence overlaps validated using PCR/Sanger approach. A detailed Illustration is provided in Figure <xref ref-type="supplementary-material" rid="SM1">S1</xref>.</p></caption>
<graphic xlink:href="fmicb-08-01272-g0001.tif"/>
</fig>
<p>For further characterization, we analyzed 1 kb DNA sequences flanking PacBio gaps (i.e., contig termini regions) from three near-finished genomes in this study. A self-blast was performed using 1 kb regions as a query against the entire genome using Geneious software with default parameters. The grade score from Geneious software (i.e., a cumulative score generated by combining the % pairwise identity, % query coverage, e-value) for the top blast hits for gap termini regions are described in Table <xref ref-type="supplementary-material" rid="SM2">S10</xref>. In each genome, except AD2, sequences flanking the gap regions showed high similarity (grade: &#x0003E;95%) with another region within the same genome indicating repetitive DNA sequences could have contributed to assembly challenges. Sequences flanking AD2_Gap1 have a low (grade: 72%) similarity score within the genome, consistent with the finding that the AD2 was comparatively easier to finish using standard PCR/Sanger sequencing approaches. To further validate this observation, we repeated the flanking DNA sequence analysis steps for an independent dataset (Koren et al., <xref ref-type="bibr" rid="B30">2013</xref>). In three incomplete genome assemblies, most of the sequences flanking the gaps were determined to have high similarity (grade: &#x0003E;95%) within the same genome (Table <xref ref-type="supplementary-material" rid="SM2">S14</xref>) that may contribute to the fragmented PacBio assemblies.</p>
<p>Various biological aspects of seven genomes within this study, as well as for the <italic>C. thermocellum</italic> LQRI (LQRI), and <italic>P. fermentans</italic> DSM 17108 genomes (Utturkar et al., <xref ref-type="bibr" rid="B66">2016</xref>) were further analyzed for gaps within PacBio assemblies. Specific biological features of the genomes that likely interfered with overall assembly process are summarized in Figure <xref ref-type="fig" rid="F2">2</xref>. The complete genome sequence of strain JBW45 was characterized by the presence of an active transposon element which interfered with the genome circularization process (De Leon et al., <xref ref-type="bibr" rid="B12">2015</xref>). The <italic>C. paradoxum</italic> genome was reported to contain multiple rRNA operons with heterogeneous intervening sequences (15 different sequences in the variable region I) of 16S rRNA (Rainey et al., <xref ref-type="bibr" rid="B52">1996</xref>) which could contribute to the fragmented assembly. Assembly analysis of strain ATCC 6013 and <italic>P. fermentans</italic> DSM 17108 revealed a possible phage integration and presence of large sequence duplication. The KO116 genome was characterized by the presence of two megaplasmids. For a previously uncharacterized bacterium, multiple contigs could lead to the impression of having a near-finished genome assembly instead of megaplasmid sequences in the absence of manual inspection. We expect new tools such as plasmidSPAdes (Antipov et al., <xref ref-type="bibr" rid="B1">2016</xref>) will be useful in assembling and assessing plasmid DNA content from whole genome sequencing data. Automated HGAP assembly of LQRI obtained a near-finished assembly containing two contigs. After careful evaluation, one of the contigs was found to represent a complete circular genome while smaller 12 kb contig was determined as duplicated sequence artifact. <italic>B. cellulosolvens</italic> DSM 2933 contig termini were characterized by the presence of transposon-related genes (Dassa et al., <xref ref-type="bibr" rid="B11">2015</xref>). In summary, our initial hypothesis that structural features of DNA (hairpin-loops, secondary structures, supercoiling, and nucleosome positioning) might affect the PacBio coverage in certain regions leading to assembly gaps was not accepted. In terms of assembly, it is likely that the HGAP software did not have sufficient read coverage to support automatic closure of these sequences and resulted in assembly gaps. Analysis of gap sequences revealed that in many cases, DNA sequences flanking the gaps have more than one copy within the genome, and some were corresponding to long repetitive elements such as &#x0201C;Transposon-related proteins&#x0201D; (De Leon et al., <xref ref-type="bibr" rid="B12">2015</xref>). Further analysis revealed specific biological features such as the presence of active mobile genetic elements, plasmid sequences and phage integration which can lead to fragmented PacBio assemblies. Hence, although we could not determine one specific trend, PacBio gaps sequences were associated with a cumulative effect of number of repeats and their sizes, sequence depth, and various biological features associated with specific genomes.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Summary of biological features with potential to interfere with the assembly process. <bold>(A)</bold> Presence of active transposon elements in strain JBW45 <bold>(B)</bold> repetitive transposon sequences at the contig terminus region of <italic>B. cellulosolvens</italic> <bold>(C)</bold> large sequence duplications in <italic>C. pasteurianum</italic> <bold>(D)</bold> presence of megaplsmids in stain KO116 <bold>(E)</bold> genome duplication assembled as spurious contig in <italic>C. thermocellum</italic> LQRI <bold>(F)</bold> multiple copies of rRNA operons in <italic>C. paradoxum</italic> JW-YL-7. The figures are illustration only and not drawn to scale.</p></caption>
<graphic xlink:href="fmicb-08-01272-g0002.tif"/>
</fig>
</sec>
<sec>
<title>Unassembled DNA regions in illumina-only assemblies</title>
<p>Unassembled DNA or assembly breakpoints in Illumina assemblies were revealed by mapping the contigs from Illumina-only assembly against the final (finished or near-finished) genome assemblies. Short reads from Illumina technology have limited power to resolve longer repetitive regions (Salzberg et al., <xref ref-type="bibr" rid="B57">2012</xref>; Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>) and rRNA operons are considered among the most difficult regions to assemble (Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>). Our comparisons demonstrate at least half the total rRNA operons were completely missing (unassembled) from the Illumina assembly while from remaining half, most could only be assembled partially (i.e., missing one of the 5S, 16S, or 23S elements). These findings are consistent with our previous results suggesting that rRNA operons correspond to many of breakpoints within short-read assemblies (Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>; Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>). On the other hand, longer PacBio reads resolved the majority of rRNA operons as evident through circular genome assemblies and comparison of finished vs. draft assemblies (Tables <xref ref-type="supplementary-material" rid="SM2">S4</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM2">S9</xref>). The total number of rRNA operons present in each genome, number of missing rRNA operons, and number of partially assembled rRNA operons in Illumina assembly are provided in Table <xref ref-type="table" rid="T2">2</xref>. Illumina-only assemblies often also lacked tRNA due to their physical linkage to incomplete rRNA operons, as well as other genes encoding putative functions for transposase and hypothetical proteins. The average size of rRNA operons is &#x0007E;5&#x02013;7 kb and constituted the longest gaps within Illumina assemblies. Apart from rRNA operons, other regions that contributed to fragmented Illumina assemblies include transposon sequences, ABC-type transporters (which number in the double digits for most genomes), RNA-directed DNA polymerases (which have long sequences and share high homology), as well hypothetical proteins. A complete table describing the draft vs. finished assembly comparison details (gap coordinates, length, associated annotation, and locus tags) for each genome are provided (Tables <xref ref-type="supplementary-material" rid="SM2">S4</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM2">S9</xref>) and graphical representation of Illumina/PacBio gaps within AD2 genome is shown (<xref ref-type="fig" rid="F1">Figure 1</xref>). The genome of <italic>C. pasteurianum</italic> was the only exception where two large contigs from Illumina assembly were accurate and contained all the rRNA operons and no other gaps were detected.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Summary of rRNA operons present within Illumina assembly.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Organism</bold></th>
<th valign="top" align="center"><bold>Total rRNA operons</bold></th>
<th valign="top" align="center"><bold>Number (percentage) of rRNA operons missing (unassembled) from Illumina assembly</bold></th>
<th valign="top" align="center"><bold>Number of partially assembled rRNA operons in Illumina assembly</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>Clostridium thermocellum</italic> AD2</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2 (50)</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Halomonas</italic> sp. KO116</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">4 (66)</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Pelosinus</italic> sp. UFO1</td>
<td valign="top" align="center">14</td>
<td valign="top" align="center">12 (85)</td>
<td valign="top" align="center">2</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Pelosinus fermentans</italic> JBW45</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">5 (55)</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Clostridium paradoxum</italic> JW-YL-7</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">11 (90)</td>
<td valign="top" align="center">1</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Bacteroides cellulosolvens</italic> DSM 2933</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">4 (50)</td>
<td valign="top" align="center">4</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Clostridium pasteurianum</italic> ATCC 6013</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">0 (0)</td>
<td valign="top" align="center">0</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Insights into assembly and polishing improvement approaches</title>
<p>A variety of assembly algorithms are available for <italic>de novo</italic> and hybrid assembly (Salzberg et al., <xref ref-type="bibr" rid="B57">2012</xref>; Magoc et al., <xref ref-type="bibr" rid="B37">2013</xref>; Koren and Phillippy, <xref ref-type="bibr" rid="B29">2014</xref>), read error correction (Lin and Liao, <xref ref-type="bibr" rid="B35">2015</xref>), scaffolding (Bashir et al., <xref ref-type="bibr" rid="B3">2012</xref>; English et al., <xref ref-type="bibr" rid="B16">2012</xref>), and genome finishing (Swain et al., <xref ref-type="bibr" rid="B61">2012</xref>) with different NGS data types. Our aim was to perform an assessment of gaps rather than an evaluation of assemblers and we chose SPAdes and ABySS to assemble Illumina data and HGAP to assemble PacBio data based on previous success (Brown S. et al., <xref ref-type="bibr" rid="B6">2014</xref>; Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>). Consistent with previous findings (Brown S. D. et al., <xref ref-type="bibr" rid="B5">2014</xref>), PacBio-only assemblies have the best statistics followed by hybrid and Illumina-only assemblies. Assembly summary statistics for <italic>de novo</italic> and hybrid assemblies are described in Table <xref ref-type="table" rid="T3">3</xref>. It is worth mentioning that using the latest versions of assembly algorithms had significant impacts on overall assembly statistics. For example, <italic>B. cellulosolvens</italic> DSM 2933 (Dassa et al., <xref ref-type="bibr" rid="B11">2015</xref>) and <italic>C. pasteurianum</italic> ATCC 6013 (Pyne et al., <xref ref-type="bibr" rid="B50">2014</xref>) genomes assembled through SMRT analysis v2.2 obtained substantial improvement over v2.0 assembly (Table <xref ref-type="table" rid="T3">3</xref>). The field of bioinformatics is rapidly evolving with the novel, efficient assembly algorithms such as Canu (Koren et al., <xref ref-type="bibr" rid="B31">2017</xref>), HINGE (Kamath et al., <xref ref-type="bibr" rid="B26">2017</xref>) for long reads, and integrated pipelines (Coil et al., <xref ref-type="bibr" rid="B10">2015</xref>; Page et al., <xref ref-type="bibr" rid="B49">2016</xref>) for short reads. For future assembly projects, it is recommended to use multiple assembly programs to obtain the optimal assembly and use our rRNA analysis approach for an additional verification of assembly accuracy (Utturkar et al., <xref ref-type="bibr" rid="B68">2014</xref>). It is also important to perform a careful analysis of contigs to check for the presence of plasmid content, which could be misinterpreted as near-finished assemblies.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption><p>Assembly summary statistics for <italic>de novo</italic> and hybrid assemblies.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Organism</bold></th>
<th valign="top" align="left"><bold>NGS technology</bold></th>
<th valign="top" align="center"><bold>No. of contigs</bold></th>
<th valign="top" align="center"><bold>Maximum contig size (kb)</bold></th>
<th valign="top" align="center"><bold>N50 (kb)</bold></th>
<th valign="top" align="center"><bold>Genome size (Mb)</bold></th>
<th valign="top" align="left"><bold>Software</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>Clostridium thermocellum</italic> AD2</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">102</td>
<td valign="top" align="center">331</td>
<td valign="top" align="center">116</td>
<td valign="top" align="center">3.48</td>
<td valign="top" align="left">SPAdes<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">107</td>
<td valign="top" align="center">282</td>
<td valign="top" align="center">84</td>
<td valign="top" align="center">3.54</td>
<td valign="top" align="left">ABySS</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">14</td>
<td valign="top" align="center">2,270</td>
<td valign="top" align="center">2,270</td>
<td valign="top" align="center">3.57</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center">10</td>
<td valign="top" align="center">982</td>
<td valign="top" align="center">891</td>
<td valign="top" align="center">3.49</td>
<td valign="top" align="left">SMRTanalysis v 2.2</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>3,554</bold></td>
<td valign="top" align="center"><bold>3,554</bold></td>
<td valign="top" align="center"><bold>3.55</bold></td>
<td valign="top" align="left"><bold>Manual Finishing</bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Halomonas</italic> sp. KO116</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">110</td>
<td valign="top" align="center">373</td>
<td valign="top" align="center">194</td>
<td valign="top" align="center">5.13</td>
<td valign="top" align="left">SPAdes<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">120</td>
<td valign="top" align="center">315</td>
<td valign="top" align="center">115</td>
<td valign="top" align="center">5.19</td>
<td valign="top" align="left">ABySS</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">30</td>
<td valign="top" align="center">4,654</td>
<td valign="top" align="center">4,654</td>
<td valign="top" align="center">5.19</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>1 (&#x0002B; 2)<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref></bold></td>
<td valign="top" align="center"><bold>4,649</bold></td>
<td valign="top" align="center"><bold>4,649</bold></td>
<td valign="top" align="center"><bold>4.65 (&#x0002B; 0.51)<xref ref-type="table-fn" rid="TN1"><sup>a</sup></xref></bold></td>
<td valign="top" align="left"><bold>SMRTanalysis v 2.2</bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Pelosinus</italic> sp. UFO1</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">175</td>
<td valign="top" align="center">1,025</td>
<td valign="top" align="center">637</td>
<td valign="top" align="center">5.13</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">131</td>
<td valign="top" align="center">169</td>
<td valign="top" align="center">78</td>
<td valign="top" align="center">5.03</td>
<td valign="top" align="left">ABySS<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">147</td>
<td valign="top" align="center">4,498</td>
<td valign="top" align="center">4,498</td>
<td valign="top" align="center">5.19</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>5,115</bold></td>
<td valign="top" align="center"><bold>5,115</bold></td>
<td valign="top" align="center"><bold>5.12</bold></td>
<td valign="top" align="left"><bold>SMRTanalysis v 2.1<xref ref-type="table-fn" rid="TN2"><sup>b</sup></xref></bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Pelosinus fermentans</italic> JBW45</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">70</td>
<td valign="top" align="center">477</td>
<td valign="top" align="center">244</td>
<td valign="top" align="center">5.3</td>
<td valign="top" align="left">SPAdes<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">114</td>
<td valign="top" align="center">318</td>
<td valign="top" align="center">110</td>
<td valign="top" align="center">5.4</td>
<td valign="top" align="left">ABySS</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">1</td>
<td valign="top" align="center">5,381</td>
<td valign="top" align="center">5,381</td>
<td valign="top" align="center">5.38</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>5,381</bold></td>
<td valign="top" align="center"><bold>5,381</bold></td>
<td valign="top" align="center"><bold>5.38</bold></td>
<td valign="top" align="left"><bold>SMRTanalysis v 2.2</bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Clostridium paradoxum</italic> JW-YL-7</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">661</td>
<td valign="top" align="center">293</td>
<td valign="top" align="center">121</td>
<td valign="top" align="center">2.23</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">43</td>
<td valign="top" align="center">235</td>
<td valign="top" align="center">74</td>
<td valign="top" align="center">1.84</td>
<td valign="top" align="left">ABySS<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">612</td>
<td valign="top" align="center">1,061</td>
<td valign="top" align="center">323</td>
<td valign="top" align="center">2.26</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>3</bold></td>
<td valign="top" align="center"><bold>1,855</bold></td>
<td valign="top" align="center"><bold>1,855</bold></td>
<td valign="top" align="center"><bold>1.93</bold></td>
<td valign="top" align="left"><bold>SMRTanalysis v 2.2</bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Bacteroides cellulosolvens</italic> DSM 2933</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">194</td>
<td valign="top" align="center">1,143</td>
<td valign="top" align="center">271</td>
<td valign="top" align="center">6.81</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">172</td>
<td valign="top" align="center">358</td>
<td valign="top" align="center">130</td>
<td valign="top" align="center">6.99</td>
<td valign="top" align="left">ABySS<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">122</td>
<td valign="top" align="center">3,522</td>
<td valign="top" align="center">3,522</td>
<td valign="top" align="center">6.91</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center">12</td>
<td valign="top" align="center">2,261</td>
<td valign="top" align="center">1,340</td>
<td valign="top" align="center">6.94</td>
<td valign="top" align="left">SMRTanalysis v 2.0<xref ref-type="table-fn" rid="TN2"><sup>b</sup></xref></td>
</tr>
<tr>
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center">3</td>
<td valign="top" align="center">6,349</td>
<td valign="top" align="center">6,349</td>
<td valign="top" align="center">6.88</td>
<td valign="top" align="left">SMRTanalysis v 2.2</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>1</bold></td>
<td valign="top" align="center"><bold>6,878</bold></td>
<td valign="top" align="center"><bold>6,878</bold></td>
<td valign="top" align="center"><bold>6.87</bold></td>
<td valign="top" align="left"><bold>Manual Finishing</bold></td>
</tr> <tr>
<td valign="top" align="left"><italic>Clostridium pasteurianum</italic> ATCC 6013</td>
<td valign="top" align="left">Illumina</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">4,108</td>
<td valign="top" align="center">4,108</td>
<td valign="top" align="center">4.36</td>
<td valign="top" align="left">SPAdes<sup>&#x0002A;</sup></td>
</tr>
<tr>
<td/>
<td/>
<td valign="top" align="center">101</td>
<td valign="top" align="center">207</td>
<td valign="top" align="center">73</td>
<td valign="top" align="center">4.35</td>
<td valign="top" align="left">ABySS</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">Illumina &#x0002B; PacBio</td>
<td valign="top" align="center">9</td>
<td valign="top" align="center">4,022</td>
<td valign="top" align="center">4,022</td>
<td valign="top" align="center">4.36</td>
<td valign="top" align="left">SPAdes</td>
</tr>
<tr>
<td/>
<td valign="top" align="left">PacBio-only</td>
<td valign="top" align="center"><bold>2</bold></td>
<td valign="top" align="center"><bold>4,374</bold></td>
<td valign="top" align="center"><bold>4,374</bold></td>
<td valign="top" align="center"><bold>4.39</bold></td>
<td valign="top" align="left"><bold>SMRTanalysis v 2.2</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Best assemblies shown in bold. The best draft assembly achieved with only the Illumina data are marked with <sup>&#x0002A;</sup>.</italic></p>
<fn id="TN1">
<label>a</label>
<p><italic>Additional numbers shown in brackets correspond to the extra-chromosomal plasmid DNA.</italic></p></fn>
<fn id="TN2">
<label>b</label>
<p><italic>Assemblies performed prior to the availability of SMRTanalysis version 2.2. Prior assemblies are included to describe the effectiveness of algorithm improvement on genome assembly using the same data</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Long read sequencing platforms are criticized for their frequent (&#x0007E;15%), but random errors in the PacBio platform can be corrected by using high (&#x0003E;100x) sequence coverage and/or Illumina data. However, uniform sequence coverage across the entire genome is not guaranteed and low coverage regions are prone to base-call errors. Assembly polishing is a crucial step to obtain accurate consensus sequence and facilitate downstream applications. Two assembly base-call correction algorithms applied in this study are Quiver (correction using PacBio reads) and Pilon (correction using Illumina reads) while iCORN (Otto et al., <xref ref-type="bibr" rid="B45">2010</xref>) is another alternative. The default HGAP protocol is implemented with a single round of Quiver polishing and we applied additional rounds of Pilon correction for further assembly quality improvements. The majority of the base-call errors corrected by Pilon were insertions-deletions (indels) (Table <xref ref-type="table" rid="T4">4</xref> and Tables <xref ref-type="supplementary-material" rid="SM2">S11</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM2">S14</xref>), which were responsible for the frameshift mutations and correspond to altered Open Reading Frame (ORF) predictions. To validate the accuracy of Pilon calls, 47 random Pilon corrections across four finished genomes were verified by PCR and Sanger sequencing. Our results indicate that 40 of the 47 (&#x0007E;85%) tested corrections by Pilon were accurate and supported by two (forward and reverse) high quality Sanger reads. The remaining seven suggested Pilon modifications were ruled out based on lack of support from analysis of Sanger data.</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption><p>Summary of Pilon call verification by Sanger sequencing.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Genome</bold></th>
<th valign="top" align="center"><bold>Total number of SNP<xref ref-type="table-fn" rid="TN3"><sup>&#x0002A;</sup></xref> verified by Sanger</bold></th>
<th valign="top" align="center"><bold>Total number of correct calls</bold></th>
<th valign="top" align="center"><bold>Total number of incorrect calls</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left"><italic>Pelosinus fermentans</italic> JBW45</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">11</td>
<td valign="top" align="center">0</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Clostridium thermocellum</italic> AD2</td>
<td valign="top" align="center">22</td>
<td valign="top" align="center">17</td>
<td valign="top" align="center">5</td>
</tr>
<tr>
<td valign="top" align="left"><italic>Pelosinus</italic> sp. UFO1</td>
<td valign="top" align="center">6</td>
<td valign="top" align="center">4</td>
<td valign="top" align="center">2</td>
</tr>
<tr style="border-bottom: thin solid #000000;">
<td valign="top" align="left"><italic>Halomonas</italic> sp. KO116</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">8</td>
<td valign="top" align="center">0</td>
</tr> <tr>
<td valign="top" align="left">Total</td>
<td valign="top" align="center">47</td>
<td valign="top" align="center">40</td>
<td valign="top" align="center">7</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TN3"><label>&#x0002A;</label><p><italic>SNP refers to polymorphisms as well as indels. 19 of 47 SNP calls were indels while 1 of 7 incorrect SNP calls was indel</italic>.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Further evaluation of Pilon corrections was performed by measuring the changes in the protein coding potential and positive/negative influence on gene calling accuracy (Section <xref ref-type="supplementary-material" rid="SM1">S1</xref>). In most cases, Pilon corrections improved the protein coding potential by predicting longer ORFs, joined ORFs (previously split genes were joined together to represent single long ORF), and few novel ORFs (Tables <xref ref-type="supplementary-material" rid="SM2">S11</xref>&#x02013;<xref ref-type="supplementary-material" rid="SM2">S14</xref>). Certain Pilon corrections resulted in split ORFs (previously longer ORF were split into two ORFs), but such cases were comparatively fewer than the number of improved ORFs. Moreover, most of the changed ORFs were associated with improved BLASTN results (e-value, percent similarity, percent identity, and subject length) suggesting enhanced gene-calling accuracy. To summarize, there were total 314 modifications suggested by Pilon across four finished genomes, of which 154 (49%) have resulted in improved protein coding potential (longer/joined/novel ORFs), 38 (12%) were associated with split/shorter ORFs while 122 (38%) had no change. Considering the BLASTN results, 183 (58%) corrections have a positive influence on gene calling accuracy, 35 (11%) corrections deteriorated the BLASTN results while 96 (31%) had no changes. Pilon is a useful tool for <italic>in silico</italic> genome refinement and recommended when Illumina data is available.</p>
<p>Another important aspect of finished genome sequences is an accurate representation of a circular chromosome. Automatically finished assemblies generated through HGAP often have (duplicated) overlapping ends which need to be trimmed off for the final assembly. This could be achieved using the circulator (Hunt et al., <xref ref-type="bibr" rid="B24">2015</xref>) software which performs automated assembly circularization and sets the <italic>dnaA</italic> gene as the starting position. In this study, assembly circularization was performed manually through a read mapping and alignment approach before the availability of circlator software. Later, a comparison of circlator and manual assemblies was performed and results were similar (data not shown). Therefore, for future projects, the application of circlator software followed by a careful inspection of the trimmed regions is recommended.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>Conclusions</title>
<p>In this study, we present an effective manual finishing approach targeted toward near-finished microbial genome assemblies. The importance of genome polishing steps is demonstrated through its positive influence on gene calling accuracy and improved protein coding potential, which will be useful to others looking to improve long-read assemblies. Assessment of Illumina gaps confirmed previous findings that repetitive rRNA operons are major contributors to fragmented short-read assemblies. For PacBio assemblies, our initial hypothesis that structural features of DNA might affect the PacBio sequence coverage leading to assembly gap was not accepted. However, we demonstrated that certain biological features such as presence of active transposons, plasmid sequences, and phage integration are possible reasons for assembly fragmentation. Additionally, DNA regions flanking the PacBio gap sequences showed high degrees of similarity with other loci and are likely contributors to incomplete PacBio assemblies in this dataset. The PacBio gap sequences in this study are attributed to a cumulative effect of various aspects of repetitive DNA content and biological features for specific genomes. Despite a few limitations, long reads from third-generation sequencing, in this case from the PacBio platform, are particularly advantageous for generating <italic>de novo</italic> microbial genome assemblies. Our datasets and analyses will aid future efforts to better understand and overcome unassembled DNA from PacBio assemblies.</p>
</sec>
<sec id="s5">
<title>Author contributions</title>
<p>SU designed the study, performed, and contributed to all the experiments and analyses and wrote the manuscript draft; DK extracted genomic DNA, performed Illumina sequencing, and assisted with PCR and Sanger sequencing; RH contributed to study design and edited the manuscript; SB contributed to study design, assisted with draft writing, and editing. All authors reviewed and approved the manuscript.</p>
<sec>
<title>Conflict of interest statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
</sec>
</body>
<back>
<ack><p>Submitted as a partial requirement for SU&#x00027;s PhD thesis and we thank his thesis committee (Mitch Doktycz, Dale Pelletier, Chris Schadt, ORNL, and Gladys Alexandre UT) for helpful suggestions and guidance. Miriam Land (ORNL) is acknowledged for her support maintaining the Microbial Genome Annotation Pipeline, which facilitated annotations and gene model comparisons. Some sequence data analyzed in this study were published earlier in collaboration with the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sanger sequencer data was generated at the Molecular Biology Resource Facility at the University of Tennessee, Knoxville.</p>
</ack>
<sec sec-type="supplementary-material" id="s6">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="http://journal.frontiersin.org/article/10.3389/fmicb.2017.01272/full#supplementary-material">http://journal.frontiersin.org/article/10.3389/fmicb.2017.01272/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="DataSheet1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
<supplementary-material xlink:href="Table1.XLSX" id="SM2" mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Antipov</surname> <given-names>D.</given-names></name> <name><surname>Hartwick</surname> <given-names>N.</given-names></name> <name><surname>Shen</surname> <given-names>M.</given-names></name> <name><surname>Raiko</surname> <given-names>M.</given-names></name> <name><surname>Lapidus</surname> <given-names>A.</given-names></name> <name><surname>Pevzner</surname> <given-names>P. A.</given-names></name></person-group> (<year>2016</year>). <article-title>plasmidSPAdes: assembling plasmids from whole genome sequencing data</article-title>. <source>Bioinformatics</source> <volume>32</volume>, <fpage>3380</fpage>&#x02013;<lpage>3387</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btw493</pub-id><pub-id pub-id-type="pmid">27466620</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bankevich</surname> <given-names>A.</given-names></name> <name><surname>Nurk</surname> <given-names>S.</given-names></name> <name><surname>Antipov</surname> <given-names>D.</given-names></name> <name><surname>Gurevich</surname> <given-names>A. A.</given-names></name> <name><surname>Dvorkin</surname> <given-names>M.</given-names></name> <name><surname>Kulikov</surname> <given-names>A. S.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing</article-title>. <source>J. Comput. Biol.</source> <volume>19</volume>, <fpage>455</fpage>&#x02013;<lpage>477</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2012.0021</pub-id><pub-id pub-id-type="pmid">22506599</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bashir</surname> <given-names>A.</given-names></name> <name><surname>Klammer</surname> <given-names>A. A.</given-names></name> <name><surname>Robins</surname> <given-names>W. P.</given-names></name> <name><surname>Chin</surname> <given-names>C. S.</given-names></name> <name><surname>Webster</surname> <given-names>D.</given-names></name> <name><surname>Paxinos</surname> <given-names>E.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>A hybrid approach for the automated finishing of bacterial genomes</article-title>. <source>Nat. Biotechnol.</source> <volume>30</volume>, <fpage>701</fpage>&#x02013;<lpage>707</lpage>. <pub-id pub-id-type="doi">10.1038/nbt.2288</pub-id><pub-id pub-id-type="pmid">22750883</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bishnoi</surname> <given-names>U.</given-names></name> <name><surname>Polson</surname> <given-names>S. W.</given-names></name> <name><surname>Sherrier</surname> <given-names>D. J.</given-names></name> <name><surname>Bais</surname> <given-names>H. P.</given-names></name></person-group> (<year>2015</year>). <article-title>Draft genome sequence of a natural root isolate, <italic>Bacillus subtilis</italic> UD1022, a potential plant growth-promoting biocontrol agent</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00696</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00696-15</pub-id><pub-id pub-id-type="pmid">26159522</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>S. D.</given-names></name> <name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Magnuson</surname> <given-names>T. S.</given-names></name> <name><surname>Ray</surname> <given-names>A. E.</given-names></name> <name><surname>Poole</surname> <given-names>F. L.</given-names></name> <name><surname>Lancaster</surname> <given-names>W. A.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Complete genome sequence of <italic>Pelosinus</italic> sp. strain UFO1 assembled using Single-Molecule Real-Time DNA sequencing technology</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e00881</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00881-14</pub-id><pub-id pub-id-type="pmid">25189589</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname> <given-names>S.</given-names></name> <name><surname>Nagaraju</surname> <given-names>S.</given-names></name> <name><surname>Utturkar</surname> <given-names>S.</given-names></name> <name><surname>De Tissera</surname> <given-names>S.</given-names></name> <name><surname>Segovia</surname> <given-names>S.</given-names></name> <name><surname>Mitchell</surname> <given-names>W.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of <italic>Clostridium autoethanogenum</italic> and analysis of CRISPR systems in industrial relevant Clostridia</article-title>. <source>Biotechnol. Biofuels</source> <volume>7</volume>:<fpage>40</fpage>. <pub-id pub-id-type="doi">10.1186/1754-6834-7-40</pub-id><pub-id pub-id-type="pmid">24655715</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Buermans</surname> <given-names>H. P.</given-names></name> <name><surname>den Dunnen</surname> <given-names>J. T.</given-names></name></person-group> (<year>2014</year>). <article-title>Next generation sequencing technology: advances and applications</article-title>. <source>Biochim. Biophys. Acta</source> <volume>1842</volume>, <fpage>1932</fpage>&#x02013;<lpage>1941</lpage>. <pub-id pub-id-type="doi">10.1016/j.bbadis.2014.06.015</pub-id><pub-id pub-id-type="pmid">24995601</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chain</surname> <given-names>P. S.</given-names></name> <name><surname>Grafham</surname> <given-names>D. V.</given-names></name> <name><surname>Fulton</surname> <given-names>R. S.</given-names></name> <name><surname>Fitzgerald</surname> <given-names>M. G.</given-names></name> <name><surname>Hostetler</surname> <given-names>J.</given-names></name> <name><surname>Muzny</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2009</year>). <article-title>Genomics. Genome project standards in a new era of sequencing</article-title>. <source>Science</source> <volume>326</volume>, <fpage>236</fpage>&#x02013;<lpage>237</lpage>. <pub-id pub-id-type="doi">10.1126/science.1180614</pub-id><pub-id pub-id-type="pmid">19815760</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chin</surname> <given-names>C. S.</given-names></name> <name><surname>Alexander</surname> <given-names>D. H.</given-names></name> <name><surname>Marks</surname> <given-names>P.</given-names></name> <name><surname>Klammer</surname> <given-names>A. A.</given-names></name> <name><surname>Drake</surname> <given-names>J.</given-names></name> <name><surname>Heiner</surname> <given-names>C.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data</article-title>. <source>Nat. Methods</source> <volume>10</volume>, <fpage>563</fpage>&#x02013;<lpage>569</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2474</pub-id><pub-id pub-id-type="pmid">23644548</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coil</surname> <given-names>D.</given-names></name> <name><surname>Jospin</surname> <given-names>G.</given-names></name> <name><surname>Darling</surname> <given-names>A. E.</given-names></name></person-group> (<year>2015</year>). <article-title>A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data</article-title>. <source>Bioinformatics</source> <volume>31</volume>, <fpage>587</fpage>&#x02013;<lpage>589</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu661</pub-id><pub-id pub-id-type="pmid">25338718</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dassa</surname> <given-names>B.</given-names></name> <name><surname>Utturkar</surname> <given-names>S.</given-names></name> <name><surname>Hurt</surname> <given-names>R. A.</given-names></name> <name><surname>Klingeman</surname> <given-names>D. M.</given-names></name> <name><surname>Keller</surname> <given-names>M.</given-names></name> <name><surname>Xu</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Near-complete genome sequence of the cellulolytic bacterium <italic>Bacteroides</italic> (<italic>Pseudobacteroides</italic>) <italic>cellulosolvens</italic> ATCC 35603</article-title>. <source>Genome Announc</source>. <fpage>3</fpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01022-15</pub-id><pub-id pub-id-type="pmid">26404597</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>De Leon</surname> <given-names>K. B.</given-names></name> <name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Camilleri</surname> <given-names>L. B.</given-names></name> <name><surname>Elias</surname> <given-names>D. A.</given-names></name> <name><surname>Arkin</surname> <given-names>A. P.</given-names></name> <name><surname>Fields</surname> <given-names>M. W.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Complete genome sequence of <italic>Pelosinus fermentans</italic> JBW45, a member of a remarkably competitive group of negativicutes in the firmicutes phylum</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e01090</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01090-15</pub-id><pub-id pub-id-type="pmid">26404608</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deschamps</surname> <given-names>S.</given-names></name> <name><surname>Mudge</surname> <given-names>J.</given-names></name> <name><surname>Cameron</surname> <given-names>C.</given-names></name> <name><surname>Ramaraj</surname> <given-names>T.</given-names></name> <name><surname>Anand</surname> <given-names>A.</given-names></name> <name><surname>Fengler</surname> <given-names>K.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from <italic>Agrobacterium tumefaciens</italic></article-title>. <source>Sci. Rep.</source> <fpage>6</fpage>:<lpage>28625</lpage>. <pub-id pub-id-type="doi">10.1038/srep28625</pub-id><pub-id pub-id-type="pmid">27350167</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dunitz</surname> <given-names>M. I.</given-names></name> <name><surname>Coil</surname> <given-names>D. A.</given-names></name> <name><surname>Jospin</surname> <given-names>G.</given-names></name> <name><surname>Eisen</surname> <given-names>J. A.</given-names></name> <name><surname>Adams</surname> <given-names>J. Y.</given-names></name></person-group> (<year>2014</year>). <article-title>Draft genome sequences of <italic>Escherichia coli</italic> strains isolated from septic patients</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e01278</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01278-14</pub-id><pub-id pub-id-type="pmid">25523766</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eckweiler</surname> <given-names>D.</given-names></name> <name><surname>Bunk</surname> <given-names>B.</given-names></name> <name><surname>Sproer</surname> <given-names>C.</given-names></name> <name><surname>Overmann</surname> <given-names>J.</given-names></name> <name><surname>Haussler</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Complete genome sequence of highly adherent <italic>Pseudomonas aeruginosa</italic> small-colony variant SCV20265</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e01232</fpage>-<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01232-13</pub-id><pub-id pub-id-type="pmid">24459283</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>English</surname> <given-names>A. C.</given-names></name> <name><surname>Richards</surname> <given-names>S.</given-names></name> <name><surname>Han</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>M.</given-names></name> <name><surname>Vee</surname> <given-names>V.</given-names></name> <name><surname>Qu</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology</article-title>. <source>PLoS ONE</source> <volume>7</volume>:<fpage>e47768</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0047768</pub-id><pub-id pub-id-type="pmid">23185243</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Feng</surname> <given-names>Y.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Ying</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>D.</given-names></name> <name><surname>Du</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Nanopore-based fourth-generation DNA sequencing technology</article-title>. <source>Genomics Proteomics Bioinformatics</source> <volume>13</volume>, <fpage>4</fpage>&#x02013;<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1016/j.gpb.2015.01.009</pub-id><pub-id pub-id-type="pmid">25743089</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fraser</surname> <given-names>C. M.</given-names></name> <name><surname>Eisen</surname> <given-names>J. A.</given-names></name> <name><surname>Nelson</surname> <given-names>K. E.</given-names></name> <name><surname>Paulsen</surname> <given-names>I. T.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2002</year>). <article-title>The value of complete microbial genome sequencing (you get what you pay for)</article-title>. <source>J. Bacteriol.</source> <volume>184</volume>, <fpage>6403</fpage>&#x02013;<lpage>6405</lpage>. <pub-id pub-id-type="doi">10.1128/JB.184.23.6403-6405.2002</pub-id><pub-id pub-id-type="pmid">12426324</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gurevich</surname> <given-names>A.</given-names></name> <name><surname>Saveliev</surname> <given-names>V.</given-names></name> <name><surname>Vyahhi</surname> <given-names>N.</given-names></name> <name><surname>Tesler</surname> <given-names>G.</given-names></name></person-group> (<year>2013</year>). <article-title>QUAST: quality assessment tool for genome assemblies</article-title>. <source>Bioinformatics</source> <volume>29</volume>, <fpage>1072</fpage>&#x02013;<lpage>1075</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btt086</pub-id><pub-id pub-id-type="pmid">23422339</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Harhay</surname> <given-names>G. P.</given-names></name> <name><surname>McVey</surname> <given-names>D. S.</given-names></name> <name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name> <name><surname>Bono</surname> <given-names>J.</given-names></name> <name><surname>Harhay</surname> <given-names>D. M.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Complete closed genome sequences of three <italic>Bibersteinia trehalosi</italic> nasopharyngeal isolates from cattle with shipping fever</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e00084</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00084-14</pub-id><pub-id pub-id-type="pmid">24526647</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haridas</surname> <given-names>S.</given-names></name> <name><surname>Breuill</surname> <given-names>C.</given-names></name> <name><surname>Bohlmann</surname> <given-names>J.</given-names></name> <name><surname>Hsiang</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>A biologist&#x00027;s guide to <italic>de novo</italic> genome assembly using next-generation sequence data: A test with fungal genomes</article-title>. <source>J. Microbiol. Methods</source> <volume>86</volume>, <fpage>368</fpage>&#x02013;<lpage>375</lpage>. <pub-id pub-id-type="doi">10.1016/j.mimet.2011.06.019</pub-id><pub-id pub-id-type="pmid">21749903</pub-id></citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hoefler</surname> <given-names>B. C.</given-names></name> <name><surname>Konganti</surname> <given-names>K.</given-names></name> <name><surname>Straight</surname> <given-names>P. D.</given-names></name></person-group> (<year>2013</year>). <article-title><italic>De Novo</italic> assembly of the <italic>Streptomyces</italic> sp. strain Mg1 genome using PacBio single-molecule sequencing</article-title>. <source>Genome Announc.</source> <volume>1</volume>:<fpage>e00535</fpage>-<lpage>13</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00535-13</pub-id><pub-id pub-id-type="pmid">23908282</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hua</surname> <given-names>X.</given-names></name> <name><surname>Hua</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Improved complete genome sequence of the extremely radioresistant bacterium <italic>Deinococcus radiodurans</italic> R1 obtained using PacBio single-molecule sequencing</article-title>. <source>Genome Announc.</source> <volume>4</volume>:<fpage>e00886</fpage>-<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00886-16</pub-id><pub-id pub-id-type="pmid">27587813</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hunt</surname> <given-names>M.</given-names></name> <name><surname>Silva</surname> <given-names>N. D.</given-names></name> <name><surname>Otto</surname> <given-names>T. D.</given-names></name> <name><surname>Parkhill</surname> <given-names>J.</given-names></name> <name><surname>Keane</surname> <given-names>J. A.</given-names></name> <name><surname>Harris</surname> <given-names>S. R.</given-names></name></person-group> (<year>2015</year>). <article-title>Circlator: automated circularization of genome assemblies using long sequencing reads</article-title>. <source>Genome Biol.</source> <volume>16</volume>, <fpage>294</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-015-0849-0</pub-id><pub-id pub-id-type="pmid">26714481</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hyatt</surname> <given-names>D.</given-names></name> <name><surname>Chen</surname> <given-names>G. L.</given-names></name> <name><surname>Locascio</surname> <given-names>P. F.</given-names></name> <name><surname>Land</surname> <given-names>M. L.</given-names></name> <name><surname>Larimer</surname> <given-names>F. W.</given-names></name> <name><surname>Hauser</surname> <given-names>L. J.</given-names></name></person-group> (<year>2010</year>). <article-title>Prodigal: prokaryotic gene recognition and translation initiation site identification</article-title>. <source>BMC Bioinformatics</source> <volume>11</volume>:<fpage>119</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id><pub-id pub-id-type="pmid">20211023</pub-id></citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kamath</surname> <given-names>G. M.</given-names></name> <name><surname>Shomorony</surname> <given-names>I.</given-names></name> <name><surname>Xia</surname> <given-names>F.</given-names></name> <name><surname>Courtade</surname> <given-names>T. A.</given-names></name> <name><surname>Tse</surname> <given-names>D. N.</given-names></name></person-group> (<year>2017</year>). <article-title>HINGE: long-read assembly achieves optimal repeat resolution</article-title>. <source>Genome Res</source>. <volume>27</volume>, <fpage>747</fpage>&#x02013;<lpage>756</lpage>. <pub-id pub-id-type="doi">10.1101/gr.216465.116</pub-id><pub-id pub-id-type="pmid">28320918</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kanda</surname> <given-names>K.</given-names></name> <name><surname>Nakashima</surname> <given-names>K.</given-names></name> <name><surname>Nagano</surname> <given-names>Y.</given-names></name></person-group> (<year>2015</year>). <article-title>Complete genome sequence of <italic>Bacillus thuringiensis</italic> serovar tolworthi strain Pasteur Institute Standard</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00710</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00710-15</pub-id><pub-id pub-id-type="pmid">26139717</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kearse</surname> <given-names>M.</given-names></name> <name><surname>Moir</surname> <given-names>R.</given-names></name> <name><surname>Wilson</surname> <given-names>A.</given-names></name> <name><surname>Stones-Havas</surname> <given-names>S.</given-names></name> <name><surname>Cheung</surname> <given-names>M.</given-names></name> <name><surname>Sturrock</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data</article-title>. <source>Bioinformatics</source> <volume>28</volume>, <fpage>1647</fpage>&#x02013;<lpage>1649</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/bts199</pub-id><pub-id pub-id-type="pmid">22543367</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name></person-group> (<year>2014</year>). <article-title>One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly</article-title>. <source>Curr. Opin. Microbiol.</source> <volume>23C</volume>, <fpage>110</fpage>&#x02013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.1016/j.mib.2014.11.014</pub-id></citation></ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Harhay</surname> <given-names>G.</given-names></name> <name><surname>Smith</surname> <given-names>T.</given-names></name> <name><surname>Bono</surname> <given-names>J.</given-names></name> <name><surname>Harhay</surname> <given-names>D.</given-names></name> <name><surname>Mcvey</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>Reducing assembly complexity of microbial genomes with single-molecule sequencing</article-title>. <source>Genome Biol.</source> <volume>14</volume>:<fpage>R101</fpage>. <pub-id pub-id-type="doi">10.1186/gb-2013-14-9-r101</pub-id><pub-id pub-id-type="pmid">24034426</pub-id></citation></ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koren</surname> <given-names>S.</given-names></name> <name><surname>Walenz</surname> <given-names>B. P.</given-names></name> <name><surname>Berlin</surname> <given-names>K.</given-names></name> <name><surname>Miller</surname> <given-names>J. R.</given-names></name> <name><surname>Bergman</surname> <given-names>N. H.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name></person-group> (<year>2017</year>). <article-title>Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation</article-title>. <source>Genome Res</source>. <volume>27</volume>, <fpage>722</fpage>&#x02013;<lpage>736</lpage>. <pub-id pub-id-type="doi">10.1101/gr.215087.116</pub-id><pub-id pub-id-type="pmid">28298431</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Korlach</surname> <given-names>J.</given-names></name> <name><surname>Bjornson</surname> <given-names>K. P.</given-names></name> <name><surname>Chaudhuri</surname> <given-names>B. P.</given-names></name> <name><surname>Cicero</surname> <given-names>R. L.</given-names></name> <name><surname>Flusberg</surname> <given-names>B. A.</given-names></name> <name><surname>Gray</surname> <given-names>J. J.</given-names></name> <etal/></person-group>. (<year>2010</year>). <article-title>Real-time DNA sequencing from single polymerase molecules</article-title>. <source>Methods Enzymol.</source> <volume>472</volume>, <fpage>431</fpage>&#x02013;<lpage>455</lpage>. <pub-id pub-id-type="doi">10.1016/S0076-6879(10)72001-2</pub-id><pub-id pub-id-type="pmid">20580975</pub-id></citation></ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krumsiek</surname> <given-names>J.</given-names></name> <name><surname>Arnold</surname> <given-names>R.</given-names></name> <name><surname>Rattei</surname> <given-names>T.</given-names></name></person-group> (<year>2007</year>). <article-title>Gepard: a rapid and sensitive tool for creating dotplots on genome scale</article-title>. <source>Bioinformatics</source> <volume>23</volume>, <fpage>1026</fpage>&#x02013;<lpage>1028</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btm039</pub-id><pub-id pub-id-type="pmid">17309896</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lancaster</surname> <given-names>W. A.</given-names></name> <name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Poole</surname> <given-names>F. L.</given-names></name> <name><surname>Klingeman</surname> <given-names>D. M.</given-names></name> <name><surname>Elias</surname> <given-names>D. A.</given-names></name> <name><surname>Adams</surname> <given-names>M. W.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Near-complete genome sequence of <italic>Clostridium paradoxum</italic> strain JW-YL-7</article-title>. <source>Genome Announc.</source> <volume>4</volume>:<fpage>e00229</fpage>-<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00229-16</pub-id><pub-id pub-id-type="pmid">27151784</pub-id></citation></ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lin</surname> <given-names>H. H.</given-names></name> <name><surname>Liao</surname> <given-names>Y. C.</given-names></name></person-group> (<year>2015</year>). <article-title>Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches</article-title>. <source>PLoS ONE</source> <volume>10</volume>:<fpage>e0144305</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0144305</pub-id><pub-id pub-id-type="pmid">26641475</pub-id></citation></ref>
<ref id="B36">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Li</surname> <given-names>Y.</given-names></name> <name><surname>Li</surname> <given-names>S.</given-names></name> <name><surname>Hu</surname> <given-names>N.</given-names></name> <name><surname>He</surname> <given-names>Y.</given-names></name> <name><surname>Pong</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>Comparison of next-generation sequencing systems</article-title>. <source>J. Biomed. Biotechnol.</source> <volume>2012</volume>:<fpage>251364</fpage>. <pub-id pub-id-type="doi">10.1155/2012/251364</pub-id><pub-id pub-id-type="pmid">22829749</pub-id></citation></ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Magoc</surname> <given-names>T.</given-names></name> <name><surname>Pabinger</surname> <given-names>S.</given-names></name> <name><surname>Canzar</surname> <given-names>S.</given-names></name> <name><surname>Liu</surname> <given-names>X.</given-names></name> <name><surname>Su</surname> <given-names>Q.</given-names></name> <name><surname>Puiu</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2013</year>). <article-title>GAGE-B: an evaluation of genome assemblers for bacterial organisms</article-title>. <source>Bioinformatics</source> <volume>29</volume>, <fpage>1718</fpage>&#x02013;<lpage>1725</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btt273</pub-id><pub-id pub-id-type="pmid">23665771</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Margulies</surname> <given-names>M.</given-names></name> <name><surname>Egholm</surname> <given-names>M.</given-names></name> <name><surname>Altman</surname> <given-names>W. E.</given-names></name> <name><surname>Attiya</surname> <given-names>S.</given-names></name> <name><surname>Bader</surname> <given-names>J. S.</given-names></name> <name><surname>Bemben</surname> <given-names>L. A.</given-names></name> <etal/></person-group>. (<year>2005</year>). <article-title>Genome sequencing in microfabricated high-density picolitre reactors</article-title>. <source>Nature</source> <volume>437</volume>, <fpage>376</fpage>&#x02013;<lpage>380</lpage>. <pub-id pub-id-type="doi">10.1038/nature03959</pub-id><pub-id pub-id-type="pmid">16056220</pub-id></citation></ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mehnaz</surname> <given-names>S.</given-names></name> <name><surname>Bauer</surname> <given-names>J. S.</given-names></name> <name><surname>Gross</surname> <given-names>H.</given-names></name></person-group> (<year>2014</year>). <article-title>Complete genome sequence of the sugar cane endophyte <italic>Pseudomonas aurantiaca</italic> PB-St2, a disease-suppressive bacterium with antifungal activity toward the plant pathogen <italic>Colletotrichum falcatum</italic></article-title>. <source>Genome Announc</source>. <fpage>2</fpage>:<lpage>e01108-13</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01108-13</pub-id><pub-id pub-id-type="pmid">24459254</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mrazek</surname> <given-names>J.</given-names></name> <name><surname>Chaudhari</surname> <given-names>T.</given-names></name> <name><surname>Basu</surname> <given-names>A.</given-names></name></person-group> (<year>2011</year>). <article-title>PerPlot &#x00026; PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences</article-title>. <source>Microb. Inform. Exp.</source> <volume>1</volume>:<fpage>13</fpage>. <pub-id pub-id-type="doi">10.1186/2042-5783-1-13</pub-id><pub-id pub-id-type="pmid">22587738</pub-id></citation></ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagarajan</surname> <given-names>N.</given-names></name> <name><surname>Pop</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>Sequence assembly demystified</article-title>. <source>Nat. Rev. Genet.</source> <volume>14</volume>, <fpage>157</fpage>&#x02013;<lpage>67</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3367</pub-id><pub-id pub-id-type="pmid">23358380</pub-id></citation></ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nakano</surname> <given-names>K.</given-names></name> <name><surname>Terabayashi</surname> <given-names>Y.</given-names></name> <name><surname>Shiroma</surname> <given-names>A.</given-names></name> <name><surname>Shimoji</surname> <given-names>M.</given-names></name> <name><surname>Tamotsu</surname> <given-names>H.</given-names></name> <name><surname>Ashimine</surname> <given-names>N.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>First complete genome sequence of <italic>Clostridium sporogenes</italic> DSM 795T, a nontoxigenic surrogate for <italic>Clostridium botulinum</italic>, determined using PacBio Single-Molecule Real-Time Technology</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00832</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00832-15</pub-id><pub-id pub-id-type="pmid">26227598</pub-id></citation></ref>
<ref id="B43">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>O&#x00027;Dell</surname> <given-names>K. B.</given-names></name> <name><surname>Woo</surname> <given-names>H. L.</given-names></name> <name><surname>Utturkar</surname> <given-names>S.</given-names></name> <name><surname>Klingeman</surname> <given-names>D.</given-names></name> <name><surname>Brown</surname> <given-names>S. D.</given-names></name> <name><surname>Hazen</surname> <given-names>T. C.</given-names></name></person-group> (<year>2015</year>). <article-title>Genome sequence of <italic>Halomonas</italic> sp. strain KO116, an Ionic liquid-tolerant marine bacterium isolated from a lignin-enriched seawater microcosm</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00402</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00402-15</pub-id><pub-id pub-id-type="pmid">25953187</pub-id></citation></ref>
<ref id="B44">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Okutani</surname> <given-names>A.</given-names></name> <name><surname>Osaki</surname> <given-names>M.</given-names></name> <name><surname>Takamatsu</surname> <given-names>D.</given-names></name> <name><surname>Kaku</surname> <given-names>Y.</given-names></name> <name><surname>Inoue</surname> <given-names>S.</given-names></name> <name><surname>Morikawa</surname> <given-names>S.</given-names></name></person-group> (<year>2015</year>). <article-title>Draft genome sequences of <italic>Bacillus anthracis</italic> strains stored for several decades in Japan</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00633</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00633-15</pub-id><pub-id pub-id-type="pmid">26089418</pub-id></citation></ref>
<ref id="B45">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Otto</surname> <given-names>T. D.</given-names></name> <name><surname>Sanders</surname> <given-names>M.</given-names></name> <name><surname>Berriman</surname> <given-names>M.</given-names></name> <name><surname>Newbold</surname> <given-names>C.</given-names></name></person-group> (<year>2010</year>). <article-title>Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology</article-title>. <source>Bioinformatics</source> <volume>26</volume>, <fpage>1704</fpage>&#x02013;<lpage>1707</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq269</pub-id><pub-id pub-id-type="pmid">20562415</pub-id></citation></ref>
<ref id="B46">
<citation citation-type="web"><person-group person-group-type="author"><collab>Pacific-Biosciences</collab></person-group> (<year>2014a</year>). <source>HGAP in SMRT Analysis</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-in-SMRT-Analysis">https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP-in-SMRT-Analysis</ext-link></citation></ref>
<ref id="B47">
<citation citation-type="web"><person-group person-group-type="author"><collab>Pacific-BioSciences</collab></person-group> (<year>2014b</year>). <source>SMRT Analysis Release Notes v2.2.0</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Analysis-Release-Notes-v2.2.0">https://github.com/PacificBiosciences/SMRT-Analysis/wiki/SMRT-Analysis-Release-Notes-v2.2.0</ext-link></citation></ref>
<ref id="B48">
<citation citation-type="web"><person-group person-group-type="author"><collab>Pacific-Biosciences</collab></person-group> (<year>2015</year>). <source>Circularizing and Trimming</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Circularizing-and-trimming">https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Circularizing-and-trimming</ext-link></citation></ref>
<ref id="B49">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Page</surname> <given-names>A. J.</given-names></name> <name><surname>De Silva</surname> <given-names>N.</given-names></name> <name><surname>Hunt</surname> <given-names>M.</given-names></name> <name><surname>Quail</surname> <given-names>M. A.</given-names></name> <name><surname>Parkhill</surname> <given-names>J.</given-names></name> <name><surname>Harris</surname> <given-names>S. R.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data</article-title>. <source>Microbial Genomics</source> <volume>2</volume>:<fpage>e000083</fpage>. <pub-id pub-id-type="doi">10.1099/mgen.0.000083</pub-id><pub-id pub-id-type="pmid">28348874</pub-id></citation></ref>
<ref id="B50">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pyne</surname> <given-names>M. E.</given-names></name> <name><surname>Utturkar</surname> <given-names>S.</given-names></name> <name><surname>Brown</surname> <given-names>S. D.</given-names></name> <name><surname>Moo-Young</surname> <given-names>M.</given-names></name> <name><surname>Chung</surname> <given-names>D. A.</given-names></name> <name><surname>Chou</surname> <given-names>C. P.</given-names></name></person-group> (<year>2014</year>). <article-title>Improved draft genome sequence of <italic>Clostridium pasteurianum</italic> strain ATCC 6013 (DSM 525) using a hybrid Next-Generation Sequencing approach</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e00790</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00790-14</pub-id><pub-id pub-id-type="pmid">25103768</pub-id></citation></ref>
<ref id="B51">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Quail</surname> <given-names>M. A.</given-names></name> <name><surname>Smith</surname> <given-names>M.</given-names></name> <name><surname>Coupland</surname> <given-names>P.</given-names></name> <name><surname>Otto</surname> <given-names>T. D.</given-names></name> <name><surname>Harris</surname> <given-names>S. R.</given-names></name> <name><surname>Connor</surname> <given-names>T. R.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers</article-title>. <source>BMC Genomics</source> <volume>13</volume>:<fpage>341</fpage>. <pub-id pub-id-type="doi">10.1186/1471-2164-13-341</pub-id><pub-id pub-id-type="pmid">22827831</pub-id></citation></ref>
<ref id="B52">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rainey</surname> <given-names>F. A.</given-names></name> <name><surname>Ward-Rainey</surname> <given-names>N. L.</given-names></name> <name><surname>Janssen</surname> <given-names>P. H.</given-names></name> <name><surname>Hippe</surname> <given-names>H.</given-names></name> <name><surname>Stackebrandt</surname> <given-names>E.</given-names></name></person-group> (<year>1996</year>). <article-title><italic>Clostridium paradoxum</italic> DSM 7308T contains multiple 16S rRNA genes with heterogeneous intervening sequences</article-title>. <source>Microbiology</source> <volume>142</volume> (<issue>Pt. 8</issue>), <fpage>2087</fpage>&#x02013;<lpage>2095</lpage>. <pub-id pub-id-type="doi">10.1099/13500872-142-8-2087</pub-id><pub-id pub-id-type="pmid">8760921</pub-id></citation></ref>
<ref id="B53">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rhoads</surname> <given-names>A.</given-names></name> <name><surname>Au</surname> <given-names>K. F.</given-names></name></person-group> (<year>2015</year>). <article-title>PacBio sequencing and its applications</article-title>. <source>Genomics Proteomics Bioinformatics</source> <volume>13</volume>, <fpage>278</fpage>&#x02013;<lpage>289</lpage>. <pub-id pub-id-type="doi">10.1016/j.gpb.2015.08.002</pub-id><pub-id pub-id-type="pmid">26542840</pub-id></citation></ref>
<ref id="B54">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Risse</surname> <given-names>J.</given-names></name> <name><surname>Thomson</surname> <given-names>M.</given-names></name> <name><surname>Patrick</surname> <given-names>S.</given-names></name> <name><surname>Blakely</surname> <given-names>G.</given-names></name> <name><surname>Koutsovoulos</surname> <given-names>G.</given-names></name> <name><surname>Blaxter</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>A single chromosome assembly of <italic>Bacteroides fragilis</italic> strain BE1 from Illumina and MinION nanopore sequencing data</article-title>. <source>Gigascience</source> <volume>4</volume>, <fpage>1</fpage>&#x02013;<lpage>7</lpage>. <pub-id pub-id-type="doi">10.1186/s13742-015-0101-6</pub-id><pub-id pub-id-type="pmid">26640692</pub-id></citation></ref>
<ref id="B55">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname> <given-names>R. J.</given-names></name> <name><surname>Carneiro</surname> <given-names>M. O.</given-names></name> <name><surname>Schatz</surname> <given-names>M. C.</given-names></name></person-group> (<year>2013</year>). <article-title>The advantages of SMRT sequencing</article-title>. <source>Genome Biol.</source> <volume>14</volume>:<fpage>405</fpage>. <pub-id pub-id-type="doi">10.1186/gb-2013-14-6-405</pub-id><pub-id pub-id-type="pmid">23822731</pub-id></citation></ref>
<ref id="B56">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roberts</surname> <given-names>R. J.</given-names></name> <name><surname>Vincze</surname> <given-names>T.</given-names></name> <name><surname>Posfai</surname> <given-names>J.</given-names></name> <name><surname>Macelis</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>REBASE&#x02013;a database for DNA restriction and modification: enzymes, genes and genomes</article-title>. <source>Nucleic Acids Res.</source> <volume>43</volume>(Database issue):<fpage>D298</fpage>&#x02013;<lpage>D299</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gku1046</pub-id><pub-id pub-id-type="pmid">25378308</pub-id></citation></ref>
<ref id="B57">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Salzberg</surname> <given-names>S. L.</given-names></name> <name><surname>Phillippy</surname> <given-names>A. M.</given-names></name> <name><surname>Zimin</surname> <given-names>A.</given-names></name> <name><surname>Puiu</surname> <given-names>D.</given-names></name> <name><surname>Magoc</surname> <given-names>T.</given-names></name> <name><surname>Koren</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2012</year>). <article-title>GAGE: a critical evaluation of genome assemblies and assembly algorithms</article-title>. <source>Genome Res.</source> <volume>22</volume>, <fpage>557</fpage>&#x02013;<lpage>567</lpage>. <pub-id pub-id-type="doi">10.1101/gr.131383.111</pub-id><pub-id pub-id-type="pmid">22147368</pub-id></citation></ref>
<ref id="B58">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Satou</surname> <given-names>K.</given-names></name> <name><surname>Shiroma</surname> <given-names>A.</given-names></name> <name><surname>Teruya</surname> <given-names>K.</given-names></name> <name><surname>Shimoji</surname> <given-names>M.</given-names></name> <name><surname>Nakano</surname> <given-names>K.</given-names></name> <name><surname>Juan</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Complete genome sequences of eight <italic>Helicobacter pylori</italic> strains with different virulence factor genotypes and methylation profiles, isolated from patients with diverse gastrointestinal diseases on Okinawa Island, Japan, determined using PacBio Single-Molecule Real-Time Technology</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e00286</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00286-14</pub-id><pub-id pub-id-type="pmid">24744331</pub-id></citation></ref>
<ref id="B59">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shapiro</surname> <given-names>L. R.</given-names></name> <name><surname>Scully</surname> <given-names>E. D.</given-names></name> <name><surname>Roberts</surname> <given-names>D.</given-names></name> <name><surname>Straub</surname> <given-names>T. J.</given-names></name> <name><surname>Geib</surname> <given-names>S. M.</given-names></name> <name><surname>Park</surname> <given-names>J.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Draft genome sequence of <italic>Erwinia tracheiphila</italic>, an economically important bacterial pathogen of cucurbits</article-title>. <source>Genome Announc.</source> <volume>3</volume>:<fpage>e00482</fpage>-<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00482-15</pub-id><pub-id pub-id-type="pmid">26044415</pub-id></citation></ref>
<ref id="B60">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simpson</surname> <given-names>J. T.</given-names></name> <name><surname>Wong</surname> <given-names>K.</given-names></name> <name><surname>Jackman</surname> <given-names>S. D.</given-names></name> <name><surname>Schein</surname> <given-names>J. E.</given-names></name> <name><surname>Jones</surname> <given-names>S. J.</given-names></name> <name><surname>Birol</surname> <given-names>I.</given-names></name></person-group> (<year>2009</year>). <article-title>ABySS: a parallel assembler for short read sequence data</article-title>. <source>Genome Res.</source> <volume>19</volume>, <fpage>1117</fpage>&#x02013;<lpage>1123</lpage>. <pub-id pub-id-type="doi">10.1101/gr.089532.108</pub-id><pub-id pub-id-type="pmid">19251739</pub-id></citation></ref>
<ref id="B61">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swain</surname> <given-names>M. T.</given-names></name> <name><surname>Tsai</surname> <given-names>I. J.</given-names></name> <name><surname>Assefa</surname> <given-names>S. A.</given-names></name> <name><surname>Newbold</surname> <given-names>C.</given-names></name> <name><surname>Berriman</surname> <given-names>M.</given-names></name> <name><surname>Otto</surname> <given-names>T. D.</given-names></name></person-group> (<year>2012</year>). <article-title>A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs</article-title>. <source>Nat. Protoc.</source> <volume>7</volume>, <fpage>1260</fpage>&#x02013;<lpage>1284</lpage>. <pub-id pub-id-type="doi">10.1038/nprot.2012.068</pub-id><pub-id pub-id-type="pmid">22678431</pub-id></citation></ref>
<ref id="B62">
<citation citation-type="web"><person-group person-group-type="author"><collab>The NCTC 3000 Project</collab></person-group> (<year>2016</year>). <source>The NCTC 3000 Project: Public Health England Reference Collections - Wellcome Trust Sanger Institute</source> (Accessed July 25, 2016). Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.sanger.ac.uk/resources/downloads/bacteria/nctc/">http://www.sanger.ac.uk/resources/downloads/bacteria/nctc/</ext-link></citation></ref>
<ref id="B63">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Thomma</surname> <given-names>B. P.</given-names></name> <name><surname>Seidl</surname> <given-names>M. F.</given-names></name> <name><surname>Shi-Kunne</surname> <given-names>X.</given-names></name> <name><surname>Cook</surname> <given-names>D. E.</given-names></name> <name><surname>Bolton</surname> <given-names>M. D.</given-names></name> <name><surname>van Kan</surname> <given-names>J. A.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Mind the gap; seven reasons to close fragmented genome assemblies</article-title>. <source>Fungal Genet. Biol.</source> <volume>90</volume>, <fpage>24</fpage>&#x02013;<lpage>30</lpage>. <pub-id pub-id-type="doi">10.1016/j.fgb.2015.08.010</pub-id><pub-id pub-id-type="pmid">26342853</pub-id></citation></ref>
<ref id="B64">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tong</surname> <given-names>H.</given-names></name> <name><surname>Mrazek</surname> <given-names>J.</given-names></name></person-group> (<year>2014</year>). <article-title>Investigating the interplay between nucleoid-associated proteins, DNA curvature, and CRISPR elements using comparative genomics</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e90940</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0090940</pub-id><pub-id pub-id-type="pmid">24595272</pub-id></citation></ref>
<ref id="B65">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Treangen</surname> <given-names>T. J.</given-names></name> <name><surname>Salzberg</surname> <given-names>S. L.</given-names></name></person-group> (<year>2012</year>). <article-title>Repetitive DNA and next-generation sequencing: computational challenges and solutions</article-title>. <source>Nat. Rev. Genet.</source> <volume>13</volume>, <fpage>36</fpage>&#x02013;<lpage>46</lpage>. <pub-id pub-id-type="doi">10.1038/nrg3164</pub-id><pub-id pub-id-type="pmid">22124482</pub-id></citation></ref>
<ref id="B66">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Bayer</surname> <given-names>E. A.</given-names></name> <name><surname>Borovok</surname> <given-names>I.</given-names></name> <name><surname>Lamed</surname> <given-names>R.</given-names></name> <name><surname>Hurt</surname> <given-names>R. A.</given-names></name> <name><surname>Land</surname> <given-names>M. L.</given-names></name> <etal/></person-group>. (<year>2016</year>). <article-title>Application of long sequence reads to improve genomes for <italic>Clostridium thermocellum</italic> AD2, <italic>Clostridium thermocellum</italic> LQRI, and <italic>Pelosinus fermentans</italic> R7</article-title>. <source>Genome Announc.</source> <volume>4</volume>:<fpage>e01043</fpage>-<lpage>16</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.01043-16</pub-id><pub-id pub-id-type="pmid">27688341</pub-id></citation></ref>
<ref id="B67">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Klingeman</surname> <given-names>D. M.</given-names></name> <name><surname>Bruno-Barcena</surname> <given-names>J. M.</given-names></name> <name><surname>Chinn</surname> <given-names>M. S.</given-names></name> <name><surname>Grunden</surname> <given-names>A. M.</given-names></name> <name><surname>Kopke</surname> <given-names>M.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>Sequence data for <italic>Clostridium autoethanogenum</italic> using three generations of sequencing technologies</article-title>. <source>Sci Data</source> <volume>2</volume>, <fpage>150014</fpage>. <pub-id pub-id-type="doi">10.1038/sdata.2015.14</pub-id><pub-id pub-id-type="pmid">25977818</pub-id></citation></ref>
<ref id="B68">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Utturkar</surname> <given-names>S. M.</given-names></name> <name><surname>Klingeman</surname> <given-names>D. M.</given-names></name> <name><surname>Land</surname> <given-names>M. L.</given-names></name> <name><surname>Schadt</surname> <given-names>C. W.</given-names></name> <name><surname>Doktycz</surname> <given-names>M. J.</given-names></name> <name><surname>Pelletier</surname> <given-names>D. A.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Evaluation and validation of <italic>de novo</italic> and hybrid assembly techniques to derive high quality genome sequences</article-title>. <source>Bioinformatics</source> <volume>30</volume>, <fpage>2709</fpage>&#x02013;<lpage>2716</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btu391</pub-id><pub-id pub-id-type="pmid">24930142</pub-id></citation></ref>
<ref id="B69">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Dijk</surname> <given-names>E. L.</given-names></name> <name><surname>Auger</surname> <given-names>H.</given-names></name> <name><surname>Jaszczyszyn</surname> <given-names>Y.</given-names></name> <name><surname>Thermes</surname> <given-names>C.</given-names></name></person-group> (<year>2014</year>). <article-title>Ten years of next-generation sequencing technology</article-title>. <source>Trends Genet.</source> <volume>30</volume>, <fpage>418</fpage>&#x02013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1016/j.tig.2014.07.001</pub-id><pub-id pub-id-type="pmid">25108476</pub-id></citation></ref>
<ref id="B70">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Walker</surname> <given-names>B. J.</given-names></name> <name><surname>Abeel</surname> <given-names>T.</given-names></name> <name><surname>Shea</surname> <given-names>T.</given-names></name> <name><surname>Priest</surname> <given-names>M.</given-names></name> <name><surname>Abouelliel</surname> <given-names>A.</given-names></name> <name><surname>Sakthikumar</surname> <given-names>S.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement</article-title>. <source>PLoS ONE</source> <volume>9</volume>:<fpage>e112963</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0112963</pub-id><pub-id pub-id-type="pmid">25409509</pub-id></citation></ref>
<ref id="B71">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Woo</surname> <given-names>H. L.</given-names></name> <name><surname>Utturkar</surname> <given-names>S.</given-names></name> <name><surname>Klingeman</surname> <given-names>D.</given-names></name> <name><surname>Simmons</surname> <given-names>B. A.</given-names></name> <name><surname>DeAngelis</surname> <given-names>K. M.</given-names></name> <name><surname>Brown</surname> <given-names>S. D.</given-names></name> <etal/></person-group>. (<year>2014</year>). <article-title>Draft genome sequence of the lignin-degrading <italic>Burkholderia</italic> sp. strain LIG30, isolated from wet tropical forest soil</article-title>. <source>Genome Announc.</source> <volume>2</volume>:<fpage>e00637</fpage>-<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1128/genomeA.00637-14</pub-id><pub-id pub-id-type="pmid">24948777</pub-id></citation></ref>
<ref id="B72">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zuker</surname> <given-names>M.</given-names></name></person-group> (<year>2003</year>). <article-title>Mfold web server for nucleic acid folding and hybridization prediction</article-title>. <source>Nucleic Acids Res.</source> <volume>31</volume>, <fpage>3406</fpage>&#x02013;<lpage>3415</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkg595</pub-id><pub-id pub-id-type="pmid">12824337</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> This work was supported by the Plant-Microbe Interfaces Scientific Focus Area and the BioEnergy Science Center, a U.S. DOE Bioenergy Research Center, in the Genomic Science Program, the Office of Biological and Environmental Research in the U.S. Department of Energy Office of Science. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the United States Department of Energy under contract DE-AC05-00OR22725.</p>
</fn>
</fn-group>
</back>
</article>