Reclassification of Paenibacillus riograndensis as a Genomovar of Paenibacillus sonchi: Genome-Based Metrics Improve Bacterial Taxonomic Classification

Species from the genus Paenibacillus are widely studied due to their biotechnological relevance. Dozens of novel species descriptions of this genus were published in the last couple of years, but few utilized genomic data as classification criteria. Here, we demonstrate the importance of using genome-based metrics and phylogenetic analyses to identify and classify Paenibacillus strains. For this purpose, Paenibacillus riograndensis SBR5T, Paenibacillus sonchi X19-5T, and their close relatives were compared through phenotypic, genotypic, and genomic approaches. With respect to P. sonchi X19-5T, P. riograndensis SBR5T, Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34 presented ANI (average nucleotide identity) values ranging from 95.61 to 96.32%, gANI (whole-genome average nucleotide identity) values ranging from 96.78 to 97.31%, and dDDH (digital DNA–DNA hybridization) values ranging from 68.2 to 73.2%. Phylogenetic analyses of 16S rRNA, gyrB, recA, recN, and rpoB genes and concatenated proteins supported the monophyletic origin of these Paenibacillus strains. Therefore, we propose to assign Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 to P. sonchi species, and reclassify P. riograndensis SBR5T as a later heterotypic synonym of P. sonchi (type strain X19-5T), with the creation of three novel genomovars, P. sonchi genomovar Sonchi (type strain X19-5T), P. sonchi genomovar Riograndensis (type strain SBR5T), P. sonchi genomovar Oryzarum (type strain CAS34T = DSM 102041T; = BR10511T).


INTRODUCTION
The genus Paenibacillus includes nitrogen-fixing species that were isolated from roots of wheat (Beneduzi et al., 2010), maize (Berge et al., 2002), rice (Beneduzi et al., 2008;de Souza et al., 2014), and other plants (Ambrosini et al., 2015). Many of these bacterial isolates arose from the search for strains that exert positive effects on the development of agricultural plants. The inoculation of plants or seeds with these plant-growth promoting bacteria (PGPB) can help to reduce the use of chemical fertilizers and pesticides, improving agricultural sustainability (Beneduzi et al., 2008;Seldin, 2011).
In 2010, the species Paenibacillus riograndensis (Beneduzi et al., 2010) and Paenibacillus sonchi (Hong et al., 2009) were described in an interval of only 9 days, according to electronic publication dates. P. riograndensis is a nitrogen fixer and phytormone producer (Beneduzi et al., 2010), which in later studies was found to be very closely related to P. sonchi, although their metabolic repertoire seemed to be distinct (Jin et al., 2011). Recently, the complete genome sequence of P. riograndensis SBR5 T (Brito et al., 2015) and a draft genome sequence of P. sonchi X19-5 T (Xie et al., 2014) were determined, and preliminary computation of their ANI (average nucleotide identity) suggested that they belong to the same species.
Genomic data are portable (i.e., data obtained from different laboratories can be compared), and highly informative with respect to evolutionary relationships, which are desired features for any taxonomic scheme. Although highthroughput sequencing technologies facilitated access to genome sequences, taxonomic reports rarely present this kind of data. For example, since 2016, from more than 20 novel Paenibacillus species descriptions in the International Journal of Systematic and Evolutionary Microbiology, only one of them utilized genomic analyses (Liu et al., 2016).
Here, we demonstrate the importance of using genomic analyses to clarify the taxonomic assignment of P. riograndensis and P. sonchi. For this purpose, we compared P. riograndensis SBR5 T , P. sonchi X19-5 T , and their close relatives, Paenibacillus graminis DSM 15220 T , Paenibacillus jilunlii DSM 23019 T , and two strains preliminarily identified as P. riograndensis/P. sonchi (Paenibacillus sp. CAS34 and Paenibacillus sp. CAR114) through genomic, genotypic, and phenotypic approaches.

Multivariate Analyses Based on Phenotypic Data
Principal component analysis (PCA) was used to verify the statistical variance-covariance of cellular fatty acids among five Paenibacillus species through PAST software (Hammer et al., 2001).

Genome Sequences
All genomes utilized in this study are listed in Supplementary  Table S1. Genome sequences of P. jilunlii DSM 23019 T , Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 were obtained using the Illumina Miseq platform. Total DNA was extracted as described elsewhere (Ambrosini et al., 2012). Libraries were constructed using Nextera XT kit and sequenced with MiSeq Reagent Kit V3 (2 × 300 bp). Draft genomes were assembled using SPAdes version 3.5 (careful option and k-mer values equal to 21,33,55,77,99,127) (Bankevich et al., 2012), and annotated using the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP), which combines Hidden Markov Model (HMM)-based gene prediction methods with homology-based methods (Angiuoli et al., 2008).
R2cat (Husemann and Stoye, 2010) is a contig arrangement tool used to order a set of contigs with respect to a single reference genome, in which mapping of the contigs onto the reference is based on a q-gram filter. The results are visualized by plotting contigs onto the reference sequence of a complete genome. EDGAR (Efficient Database framework for comparative Genome Analyses using BLAST score Ratios) was utilized to automatically perform genome comparisons in a high throughput approach (Blom et al., 2009). EDGAR generates synteny plots, describing the physical co-localization of genes, which may change order during evolution by Biochemical data obtained in this study. The results of biochemical tests are shown as positive "+," negative "−," or variable "V." Results obtained in our study that contradicted those obtained in other reports are highlighted in gray (more details can be found in Supplementary Table S2).
Maximum-likelihood phylogenies were conducted with the Phylogeny.fr platform with the PhyML program (v3.0 aLRT) (Dereeper et al., 2008). GTR (Generalized Time Reversible) substitution model was selected assuming an estimated proportion of invariant sites and four gamma-distributed rate categories to account for rate heterogeneity across sites. Gamma shape parameters were estimated directly from the data. Reliability for internal branching was assessed using the aLRT (approximate Likelihood Ratio Test) (Anisimova and Gascuel, 2006). All phylogenetic trees were rooted using P. polymyxa ATCC 842 T as outgroup. The 16S rRNA gene phylogenetic tree was pruned using Newick Utilities (Junier and Zdobnov, 2010), maintaining strains closely related to the P. riograndensis and P. sonchi species.

Core-Proteome and AMPHORA Multiprotein Phylogenetic Reconstructions
Ortholog protein groups were defined using bidirectional best hits algorithm implemented in Get_homologues build 20170609. For this purpose, the core-proteome was compiled using minimum BLAST searches and clusters containing inparalogs were excluded. Each of the 1102 single-copy proteins was aligned with MUSCLE software using default parameters. Alignments were concatenated with Phyutility (Smith and Dunn, 2008). Phylogenetic tree of the core-proteome was reconstructed with Mega 6 software build 6140226 (Tamura et al., 2013), using Neighbor Joining approach with Jones-Taylor-Thornton substitution model, deleting positions containing gaps and using 500 bootstrap replicates.
Thirty-one marker proteins were detected in the genomes using Amphora pipeline (Wu and Eisen, 2008) implemented at the AmphoraNet (Kerepesi et al., 2014). Since our analysis included draft genome sequences, few protein sequences were found fragmented in more than one contig. Therefore, if more than one hit for a protein was found in a genome, the largest one was kept for subsequent analyses in order to maximize the information for phylogenetic inference. All Amphora proteins are listed in Supplementary Table  S3. Proteins were aligned and concatenated as described above. Aligned positions were curated using Gblocks (Castresana, 2000) with default parameters (implemented in Phylogeny.fr pipeline). The concatenated alignment without gaps containing 6289 positions. Phylogenetic analysis of FIGURE 1 | 16S rRNA gene phylogeny of Paenibacillus species. The 16S rRNA gene rooted tree was constructed using the maximum-likelihood method. aLRT values greater than 70% are shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. P. polymyxa is the outgroup. Bacteria of P. riograndensis/P. sonchi clade are in bold. This tree is the same of Supplementary Figure S2, although only taxa of interest were kept.
AMPHORA concatenated multiprotein sequence was performed with the Phylogeny.fr platform as described in the previous section, although the WAG substitution model was utilized in this case.
Both phylogenies were rooted using P. polymyxa ATCC 842 T as the outgroup.

Phenotypic Analyses
The type strains of P. graminis and P. jilunlii, and two strains, Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34, which are all closely related to P. riograndensis and P. sonchi, had their phenotypic characteristics evaluated. Table 1 and  Supplementary Table S4 show the biochemical capabilities of these bacteria. Concerning the tests performed, P. sonchi X19-5 T and P. riograndensis SBR5 T only diverged with respect to starch degradation.
As demonstrated in Supplementary Table S5, Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 were able to fix nitrogen and synthesize indolic compounds as P. sonchi X19-5 T and P. riograndensis SBR5 T .
Considering each species at a time in the scatter plot generated from multivariate analysis of the fatty acid profiles, the points representing independent profiles of P. graminis DSM 15220 T , P. jilunlii DSM 23019 T and P. sonchi X19-5 T were more distant from each other than those from the other Paenibacillus strains (Supplementary Figure S1 and Table S6).

16S rRNA Gene Analyses
The 16S rRNA genes were analyzed concerning their identity levels and phylogenetic history. P. sonchi, P. riograndensis, P. graminis, and P. jilunlii strains shared 16S rRNA gene identity values higher or equal than the species cutoff (Supplementary  Table S7). In the 16S rRNA gene phylogeny containing all sequences from Paenibacillus type-species deposited in RDP, P. sonchi X19-5 T , P. riograndensis SBR5 T , Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 formed a clade (Figure 1 and Supplementary Figure S2). However, no subclades were present in this monophyletic group. It is worth noting that both P. riograndensis SBR5 T and P. graminis DSM 15220 T have multiple copies of 16S rRNA gene (9 and 10, respectively) in their complete sequenced genomes (not all copies were available from strains with draft genomes). Even though 16S rRNA gene paralogs were distinct, they were not variable enough to be dispersed in different clades (Supplementary Figure S2).

Genomic Analyses
For some genomic analyses, we also considered other Paenibacillus strains from sister groups of P. riograndensis-P. sonchi-P. graminis-P. jilunlii cluster, whose genome sequences were publicly available (Supplementary Table S1).
The G+C content of Paenibacillus strains ranged from 44.2 to 53.5%, although the values among P. graminis DSM 15220 T , P. jilunlii DSM 23019 T , P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34 did not vary more than 1 percentual point (Supplementary Table  S1). In order to analyze genome structural conservation in selected Paenibacillus species, we generated synteny plots to FIGURE 3 | Comparison of predicted proteins of Paenibacillus spp. against the P. riograndensis SBR5 translated genome sequence. The innermost black line circle represents the P. riograndensis SBR5 genome sequence. Each one of the outer rings represents a Paenibacillus strain, as depicted in the legend box. Colored regions of each ring symbolize tblastn hits with at least 30% of identity.
compare the complete genome sequence of P. riograndensis SBR5 T to those from P. graminis DSM 15220 T and Paenibacillus sp. HW567 (Figure 2A). Also, the draft genome sequences of P. sonchi X19-5 T , Paenibacillus sp. CAR114, Paenibacillus sp. CAS34, P. graminis DSM 15220 T , P. jilunlii DSM 23019 T , and P. polymyxa ATCC 842 T were mapped to the complete genome sequence of P. riograndensis SBR5 T (Figure 2B). With the exception of P. polymyxa ATCC 842 T graph, the plots presented long diagonal lines, interrupted only by few short gaps (Figures 2A,B). The content of homolog proteins of the Paenibacillus strains in relation to the translated genome sequence of P. riograndensis SBR5 T was also verified (Figure 3). Closely related strains, represented by the six inner rings in Figure 3, presented more color dense regions than other strains. On the other hand, the 10 outer rings of Figure 3, representing relatively more distant strains in relation to P. riograndensis SBR5 T , presented more gaps.
Although the dDDH value of Paenibacillus sp. CAR114 versus P. sonchi X19-5 T was 68.2%, the superior limit of the confidence interval (71.12%) surpassed the dDDH species threshold of 70% ( Table 2). Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 presented dDDH values higher than the subspecies FIGURE 4 | Relationship between ANI and 16S rRNA gene identity levels among Paenibacillus species. Each red circle represents a comparison between two Paenibacillus species. Traced line shows the 16S rRNA gene identity threshold for species demarcation. Gray background represents the ANI range for species demarcation. All six red circles over the gray background are pairwise comparisons of P. riograndensis, P. sonchi, Paenibacillus sp. CAR114, and Paenibacillus CAS34.
circumscription threshold of 79% between themselves, but lower than that in relation to P. riograndensis SBR5 T and P. sonchi X19-5 T (Supplementary Table S10). Furthermore, P. riograndensis SBR5 T and P. sonchi X19-5 T presented 73.2% of dDDH, below the subspecies threshold ( Table 2). Figure 4 and Supplementary Table S11 show the correspondence between the 16S rRNA gene identity values and ANI values computed in a pairwise manner. Dozens of strain-strain comparisons presented 16S rRNA gene identity values higher than the species threshold for this marker, although their ANI values were below 95%. Only those comparisons among P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114 and Paenibacillus CAS34 presented values above the ANI and 16S rRNA gene identity thresholds.

Phylogenetic Analyses
The phylogenetic history of P. sonchi, P. riograndensis, and their closely related species was also reconstructed utilizing gyrB, recA, recN and rpoB genes, a combined dataset of 31 marker proteins (AMPHORA pipeline) and the concatenated core-proteome. The clade composed of P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114 and Paenibacillus CAS34 was consistent in all of these phylogenetic reconstructions (Figures 5-7), as well as in the 16S rRNA gene phylogeny (Figure 1). Furthermore, with the exception of recA phylogeny, all trees contained a subclade of Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34, which grouped with P. riograndensis SBR5 T . However, this latter grouping pattern was only supported in the rpoB gene and core-proteome phylogenies.

DISCUSSION
Polyphasic taxonomy relies on phenotypic and genotypic analyses to identify and classify bacterial specimens. In line with this principle, we investigated different characteristics of P. riograndensis and P. sonchi in order to clarify their taxonomic statuses. For this purpose, we performed comparative analyses using closely related species to P. riograndensis and P. sonchi as references. As observed by Sutcliffe et al. (2012), most publications describing new prokaryotic taxa are based on only one representative strain. To overcome this limitation, we included in our analyses the strains Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34, preliminarily identified as P. sonchi/P. riograndensis based on 16S rRNA gene comparisons. Both strains were isolated from rice rhizosphere (de Souza et al., 2014), and were able to fix nitrogen and synthesize indolic compounds as P. sonchi X19-5 T and P. riograndensis SBR5 T . These features are commonly implicated with plant-growth promotion (Lugtenberg and Kamilova, 2009).  For biochemical profiling, we selected tests in which P. sonchi X19-5 T and P. riograndensis SBR5 T were initially distinguishable, based on their species description reports. However, in our experiments not only P. sonchi X19-5 T and P. riograndensis SBR5 T presented similar biochemical profiles, but also all strains evaluated did (variation among strains occurred in few tests). In fact, our results are in contrast to some previously published biochemical profiles of Paenibacillus species. For instance, from all Paenibacillus strains considered here, only P. sonchi X19-5 T would be unable to produce acid from D-glucose, Dxylose, glycerol, lactose, and maltose. Nevertheless, despite the original description of P. sonchi (Hong et al., 2009), we verified through five independent experiments that its strain X19-5 T is actually able to form acid from these carbon sources. Another finding was that P. riograndensis SBR5 T is indeed capable to reduce nitrate, as predicted by genome sequence analysis (Brito et al., 2015), contradicting its original report (Beneduzi et al., 2010). As a matter of fact, divergent biochemical profiles were relatively common among different published reports, and at least one of them was problematic because of typographic errors. Therefore, taking these observations into account, at least for the Paenibacillus species studied here, biochemical profiling lacked reproducibility.
This same problem was found while revising fatty acid profiles from independent taxonomic reports, which presented many inconsistencies among Paenibacillus strains. Although fatty acid profiling is an essential prerequisite for bacterial species description, at least in our revision it did not prove to be portable, a desirable feature for chemotaxonomic characters (Logan et al., 2009).
Bacterial identification based on phenotypic traits tend to be less accurate than identification based on genotypic methods (Maughan and Van der Auwera, 2011). First, species may be composed of phenotypically heterogeneous strains (Kumar et al., 2015), i.e., sometimes it is difficult to obtain a common phenotypic pattern among strains of a species. Besides that, reliable phenotypic data are only obtained when strains are assayed simultaneously under carefully controlled culture conditions (Rosselló-Móra, 2012), since minimum variations can affect gene expression, and consequently, the phenotype. Moreover, as such data are usually descriptive, their criteria for species circumscription are not clear; therefore they should be carefully inspected for taxonomic classification purposes.
The homogeneity of P. riograndensis, P. sonchi, P. graminis, and P. jilunlii was also observed regarding their genotypic and genomic traits. Based on the correlation between 16S rRNA FIGURE 7 | Core-proteome phylogeny of Paenibacillus species. The core-proteome rooted tree was constructed using the Neighbor Joining method. Details are as shown in Figure 1, unless specified otherwise. Bootstrap values greater than 70% are shown next to the branches.
identity and genomic relatedness, it was recommended that a 16S rRNA gene identity of ∼98.5% is the adequate minimum threshold for species demarcation (Stackebrandt and Ebers, 2006;Meier-Kolthoff et al., 2013b;Kim et al., 2014). However, this recommendation was not applicable to differentiate the species P. graminis, P. jilunlii, P. sonchi, and P. riograndensis, whose 16S rRNA genes are highly conserved. As demonstrated by other studies, depending on the organisms investigated, 16S rRNA gene may not be a proper taxonomic marker to discriminate organisms at the species level (Fox et al., 1992;Chan et al., 2012).
Similarly, G+C content is also a characteristic with limited taxonomic value for these Paenibacillus species. A G+C content variation of at most 1 percentual point is expected for intraspecies comparisons computed from genome sequences (Meier-Kolthoff et al., 2014b). However, the G+C contents of P. graminis DSM 15220 T , P. jilunlii DSM 23019 T , P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34 are very similar, not varying more than 1 percentual point among themselves. It is worth noting that the G+C content of P. riograndensis SBR5 T and P. sonchi X19-5 T were originally measured as 55.1 and 46.8%, respectively. This difference is substantial, since at least 10 percentual points of difference is expected to be found in species of distinct genus (Schleifer, 2009). However, nucleotide compositions were determined using indirect methods, which are less accurate in relation to those obtained using genomic data (Meier-Kolthoff et al., 2014b).
The structural conservation over the whole length of genomes from P. riograndensis SBR5 T and other Paenibacillus strains, namely P. sonchi X19-5 T , P. jilunlii DSM 23019 T , P. graminis DSM 15220 T , Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34 was discernible, and the genomic collinearity and proteomic conservation denoted the close phylogenetic relationship among these strains. Indeed, these Paenibacillus strains were only discriminated using methods based on genomic metrics. In this sense, whole genome data of P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34 generated by ANI, MiSI and dDDH were congruent. Therefore, these four bacterial strains would compose a single species, P. sonchi, which has name priority, since it was published first. Moreover, the results confirmed that P. graminis DSM 15220 T and P. jilunlii DSM 23019 T belong to other species.
Genome based metrics showed better resolution than 16S rRNA gene identity values. In fact, the resolving power of genome based methods was already explored to discriminate taxa at the infra-specific level. Meier-Kolthoff et al. (2014a) suggested that bacterial subspecies could be discerned based on dDDH values higher than 79%. Given this, P. sonchi could be divided in three subspecies, one harboring P. sonchi X19-5 T , other harboring P. riograndensis SBR5 T , and finally, another one containing Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34. Although a quantitative measure is an important initiative for subspecies definition, bacterial taxonomy still relies on the interpretation of phenotypic data for this purpose. Therefore, these strains could not be considered subspecies, but genomovars, i.e., they represent genotypic entities at the subspecies level, but they do not present differential phenotypic characters required for categorization as subspecies (Schloter et al., 2000). Gevers et al. (2005) stated that threshold methods such as ANI and dDDH could fail for delineating species when they contradict species phylogeny (for example, species having ANI values greater than 95% may not form a monophyletic group). Therefore, for taxonomic purposes, it is indispensable for ANI and dDDH values to be assisted by phylogenetic analyses.
The 16S rRNA gene phylogeny supported the results found in analyses based on genomic metrics, since P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 formed a clade. This phylogenetic reconstruction showed that P. sonchi and P. riograndensis are closer to each other than to any other Paenibacillus species. Logan et al. (2009) suggested that genes other than 16S rRNA gene could provide higher resolution for taxonomic analyses at species level. Indeed, the evolutionary relationships among P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 were not clear in 16S rRNA gene phylogeny. Phylogenetic reconstructions based on gyrB, recA, recN and rpoB genes, and concatenated multiprotein sequences confirmed the monophyly of P. riograndensis SBR5 T , P. sonchi X19-5 T , Paenibacillus sp. CAR114, and Paenibacillus sp. CAS34. Furthermore, they also were more discriminative than the 16S rRNA gene phylogeny. With the exception of recA phylogeny, the closest relationship of Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 predicted by genome-genome comparisons was also demonstrated in these phylogenetic reconstructions.
All above findings strongly support that Paenibacillus sp. CAR114 and Paenibacillus sp. CAS34 belong to P. sonchi species, and that P. riograndensis is a later heterotypic synonym of P. sonchi. Considering phylogenetic reconstructions and dDDH values, we propose that three P. sonchi genomovars should be defined. Since genome approaches were essential for taxonomic classification of these Paenibacillus species, we consider that genomic metrics and phylogenies of genes other than 16S rRNA gene should be compulsory in the guidelines for describing new taxa.
Emended Description of Paenibacillus sonchi Hong et al. (2009) Paenibacillus sonchi (son'chi. L. n. sonchus -i, the herb sowthistle, and also a botanical genus name; L. gen. n. sonchi of Sonchus, referring to the plant Sonchus oleraceus, the source of the rhizosphere soil from which the type strain was isolated). Description as that given by Hong et al. (2009), except for the following modifications. Acid is produced from glucose, sucrose, lactose, D-xylose, maltose, D-sorbitol. Catalase-positive. Nitrate is reduced to nitrite. The G+C content of the DNA of the type strain X19-5 T is 50.36 mol%.

AUTHOR CONTRIBUTIONS
FS and LP conceived and designed the experiments. FS, AA, LB, VW, and EdS wrote the paper. RdS, GdC, and EvB performed the biochemical assays. EdB and VB carried out the genome sequencing. EdS and FdO contributed with reagents and equipments. LB and VW performed the synteny analysis. AA performed the PCA of fatty acid profiles and analyzed the biochemical profiles of Paenibacillus strains. FS performed most of comparative genome and phylogenetic analyses and generated most of the artwork and tables. FS, AA, EvB, VW, EdS, and LP discussed the data and reviewed the manuscript.

ACKNOWLEDGMENTS
We thank Dr. Tiziano Dalla Rosa and M.Sc. José Evandro Saraiva Pereira for helping us to freeze-dry the Paenibacillus cultures. We acknowledge Dr. Jean-François Pombert for allowing us to assemble the genomes on Mozart supercomputer. We also thank CESUP-UFRGS for letting us use their computer clusters.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01849/full#supplementary-material