Comparative Genomics of Bacillus amyloliquefaciens Strains Reveals a Core Genome with Traits for Habitat Adaptation and a Secondary Metabolites Rich Accessory Genome

The Gram positive, non-pathogenic endospore-forming soil inhabiting prokaryote Bacillus amyloliquefaciens is a plant growth-promoting rhizobacterium. Bacillus amyloliquefaciens processes wide biocontrol abilities and numerous strains have been reported to suppress diverse bacterial, fungal and fungal-like pathogens. Knowledge about strain level biocontrol abilities is warranted to translate this knowledge into developing more efficient biocontrol agents and bio-fertilizers. Ever-expanding genome studies of B. amyloliquefaciens are showing tremendous increase in strain-specific new secondary metabolite clusters which play key roles in the suppression of pathogens and plant growth promotion. In this report, we have used genome mining of all sequenced B. amyloliquefaciens genomes to highlight species boundaries, the diverse strategies used by different strains to promote plant growth and the diversity of their secondary metabolites. Genome composition of the targeted strains suggest regions of genomic plasticity that shape the structure and function of these genomes and govern strain adaptation to different niches. Our results indicated that B. amyloliquefaciens: (i) suffer taxonomic imprecision that blurs the debate over inter-strain genome diversity and dynamics, (ii) have diverse strategies to promote plant growth and development, (iii) have an unlocked, yet to be delimited impressive arsenal of secondary metabolites and products, (iv) have large number of so-called orphan gene clusters, i.e., biosynthetic clusters for which the corresponding metabolites are yet unknown, and (v) have a dynamic pan genome with a secondary metabolite rich accessory genome.

Bacillus amyloliquefaciens promotes plant growth using diverse mechanisms including indole-3-acetic acid (IAA) synthesis (Shao et al., 2015;Liu et al., 2016), phosphorus solubilisation (Ravari and Heidarzadeh, 2014) and potassium solubilisation (Shakeel et al., 2015). Extracellular phytase, for instance, is considered as a plant growth promoting factor for improvement of phosphorus-use efficiency by plants (Shao et al., 2015). Bacillus amyloliquefaciens has also been used as biocontrol of numerous plant diseases caused by soil-borne microorganisms (Islam et al., 2016;Tan et al., 2016), post-harvest pathogens , insects (Aziz et al., 2016), nematodes (Castaneda- Alvarez et al., 2016), and aphids (Gadhave and Gange, 2016). Moreover, B. amyloliquefaciens has been reported to directly antagonize plant pathogens by competing for essential nutrients (Wu et al., 2016), producing antibiotic compounds (Srivastava et al., 2016) and inducing systemic acquired resistance (Ng et al., 2016). Volatile components such as acetoin have shown to be a potent inducer of systemic acquired resistance in plants (Magno-Perez-Bryan et al., 2015). Cyclic dipeptide such as cyclo(L-leucyl-L-prolyl) mitigates virulence in pathogenic bacteria (Gowrishankar et al., 2016). Additionally, biofilmproducing bacteria on the plant-root surfaces show promise for the use in the control of soil-borne pathogens (Tan et al., 2016). Therefore, it is currently regarded as promising environmental friendly means for crop protection (Wei et al., 2015). Recently, using comparable concentrations of B. amyloliquefaciens to those expected when the bacteria are used as Plant growth-promoting rhizobacteria (PGPR) and biocontrol agent (10 7 cells ml −1 ) proved non harmful to the non-target soil dwelling earthworms (Lagerlof et al., 2015). Therefore, B. amyloliquefaciens could be safely used to optimize ecosystem services and resilience toward the development of sustainable agricultural systems.
Comparative genomic analysis in B. amylioliquefaciens is made possible by the recent sequencing of multiple strains of the species. Similar to other bacterial groups the conserved "core" genome is defined as the shared genetic material among nearly all the strains of the species. The core genome contains majority of housekeeping genes and is interspersed with "accessory" genomic parts. It is believed that accessory genome is present in some strains while being absent in the rest of the species strains (Ozer et al., 2014).
In the current study, genomes of 48 strains of B. amylioliquefaciens available in GenBank (genomes submitted until December, 2016) have been mined for genes contributing to plant-beneficial functions and therefore, plant growth promotion potential and secondary metabolite arsenal. The contribution of core and accessory genome to plant growth promotion and secondary metabolite biosynthesis are also discussed.

Selection of Genomes and Genome Phylogeny
Genomes of B. amyloliquefaciens used in the study were selected among those submitted until December, 2016 in GenBank DNA database. They all have been deposited under the nomination B. amyloliquefaciens. The genomes and their corresponding strains have been described in Table 1. Nucleotide as well as the amino acid sequences of the whole genomes and the deduced coding sequences were retrieved from the GenBank DNA database for all strains (Table 1). Whole genome alignments have been conducted using REALPHY (The Reference sequence alignment based phylogeny builder, available at http://realphy.unibas.ch; Bertels et al., 2014). A Maximum Likelihood (ML) algorithm (Felsenstein, 1981) as implemented in MEGA v. 6 (Tamura et al., 2013) with evolutionary distances computed using the Kimura 2-parameter model (Kimura, 1980) was used to build the phylogenetic tree. Validity of branches in the resulting tree was evaluated by bootstrap re-sampling support of the data sets with 1,000 replications. Average nucleotide identity (ANI) values of B. amyloliquefaciens strains were estimated using the algorithm developed by Goris et al. (2007) combined with the 95∼96% cut-off for species boundary proposed by Richter and Rosselló-Móra (2009), as implemented in the server EzBioCloud available at http://www.ezbiocloud.net/tools/ani (Yoon et al., 2017). In silico genome-to-genome distance values were calculated using the web-based DSMZ service available at http://ggdc.dsmz.de

Root Colonization and Growth Promotion Factors
The presence of gene clusters (flgBCDEGKLMN, flhABFOP) and the swrABC gene cluster have been searched in the genomes of the different B. amyloliquefaciens targeted genomes (Ghelardi et al., 2012). che/fla/fli/tlp/mcp operons involved in the regulation of B. subtilis chemotactic response and their relatives in the genome of Bacillus velezensis UCMB5113, motABPS cluster responsible for cell-envelope and cellular processes motility and chemotaxis, have been mined in the different genomes studied (Niazi et al., 2014). The xerCD genes, site recombinase, are critical for the PGPRs to be effective rhizosphere colonizers (Shen et al., 2013) have been mined. Annotation and homologybased searches were conducted in the Bacillus genomes for genes encoding exopolysaccharide using B. subtilis epsA-O operon genes, tapA, tasA, sipW, pgsB, and bslA (Vlamakis et al., 2013).
The IpdC gene directs the production of phenylacetic acid (PAA), having weak auxin activity and antimicrobial against both bacteria and fungi in Azospirillum brasilense (Somers et al., 2005). As in Azospirillum, the B. simplex genome has the paa operon (data not shown), which is important for the degradation of PAA.
The gene of A. brasilense Sp245 nirK copper nitrite reductase and Bacillus nitric oxide synthase (nos) genes leading to formation of NO and hence root branching was used to mine the B. amyloliquefaciens genomes (Bruto et al., 2014).
In B. subtilis OKB105 polyamines such as spermine, spermidine, and putrescine have PGP properties (Xie et al., 2014). Genes involved in polyamine synthesis such as speA (agmatine synthesis), speB (putrescine synthesis); speD and speE (spermidine synthesis) and metK, responsible for the conversion of methionine to S-adenosyl-methionine were mined. Genes for various binding proteins, permeases, and transporters for polyamines have also been mined by keyword searches in the different genomes.

Antibiotics and Related Compounds
hcnABC genes directing production of HCN in Pseudomonas spp. have been used to mine B. amyloliquefaciens genomes (Bruto et al., 2014). phlACBD genes were used in blast searches to discover similar sequences in the genomes of the mined B. amyloliquefaciens strains (Bruto et al., 2014). gabD and gabT involved in production of pest/disease suppressing γaminobutyric acid (GABA) (Loper et al., 2012) have been used as baits in genome mining.

Resistance to Drugs
Homologues of the tetB protein that contributes to tetracycline resistance and the tetR tetracycline operon transcriptional regulator tetR in B. subtilis have been searched in the different genomes (Sakaguchi et al., 1988). Multifunctional tetracyclinemetal/H + antiporter (tetA) have also been mined (Someya et al., 1995). The operon yyaACDEHJKLRST encoding a streptothricin acetyltransferase (Jacob et al., 1994) was used as a bait in the screening of homologs in the different genomes. Fosfomycin resistance gene fosB from B. cereus was used to search for homologs in the B. amyloliquefaciens genomes . The homolog of the B. licheniformis glyoxalase/bleomycin resistance gene ykcA have been used as a bait in the blast search against mined genomes (Rey et al., 2004). The homolog of the B. subtilis (strain 168) β-lactamase gene penP have been used as a bait in the blast search against mined genomes (Barbe et al., 2009). Quinolone resistance norA homology have been searched in the different B. amyloliquefaciens genomes (Neyfakh et al., 1993). The E. coli gene floR have been mined in the B. amyloliquefaciens genome collection (Doublet et al., 2005). Bacillus subtilis 168 aadK gene, which encodes aminoglycoside 6-adenylyltransferase, a streptomycin-modifying enzyme, was mined in the different strains (Noguchi et al., 1993). Bacillus subtilis ycbJ gene encoding an aminoglycoside phosphotransferase has been used to search homologs in the genomes of the mined strains (Hosoya et al., 2002). Bacillus subtilis vmlR encoding antibiotic efflux ATPbinding transport protein has been used to mine the different genomes (Ohki et al., 2005). Genes encoding putative multidrug exporters have been mined from the different genomes according to Niazi et al. (2014).

Resistance to Heavy Metals
The genes arsABC and ywrK were used as a bait to detect any putative arsenic detoxification ability (Duan et al., 2013). We have mined the copYZAB operon formed by four genes: copA and copB that encode ATPases for influx and efflux of copper, respectively; copZ that encodes a copper chaperone; and copY, a copper responsive repressor. CopA encodes a major copper resistance mechanism. One-component regulators CueR, CopY, and CsoR, identified in E. coli, E. hirae, and M. tuberculosis, respectively, have also been mined (Rademacher and Masepohl, 2012). CtpAB and ycnJ genes encoding copper resistance proteins  were also mined. Homologs of the B. subtilis (strain 168) ynbB gene have been mined in the different genomes (Barbe et al., 2009). Homology of crcA, cspE, crcB,yhdV has been mined in all the genomes (Hu et al., 1996). Homologs of the yceGH and yaaN have been searched in all genomes (Franks et al., 2014). CzcD encodes a cadmium, cobalt and zinc/H(+)-K(+) antiporter in B. subtilis and protects the cell against elevated levels of Zn(II), Cu, Co(II), and Ni(II) (Moore et al., 2005). GenendoA (ydcE) and antitoxin gene, ndoAI (ydcD) have been mined (Wu et al., 2011). Sensors for metals; Fur, ArsR, MerR, NikR, DtxR, mtnR, and yfmP family of metalloregulators of the B. subtilis genome were mined from the different B. amyloliquefaciens genomes (Osman and Cavet, 2010).

Degradation of Aromatic Compounds
Vanillate, 4-hydroxybenzoate, salicylic, ferulic, p-coumaric acids are considered as natural toxins and cause specific stress responses in microorganisms that have developed resistance against phenolic acids. Both phenolic acid decarboxylases padC and bsdBCD (yclBCD) of B. subtilis were mined. The putative LysR-type regulator encoded by bsdA (yclA) gene upstream of the bsdBCD operon revealed is the transcriptional activator of bsdBCD expression in response to phenolic acids were also mined (Graf et al., 2016). Dibenzothiophene (DBT) is the model compound for this class of molecules. The operon dszABC of Rhodococcus sp. (Piddington et al., 1995) was used to mine the genomes. Genes encoding homologs of the B. velezenzis FZB42 azoR2, mhqADNOPE have been mined in the genome of the different strains (Nguyen et al., 2007).

Identification of Core Genome and Accessory Genomes of the Strain Collection
Spine, used to determine the core genome, defined as those sequences present in nearly all genomes from bacteria of a given species, from the sequences of all B. amyloliquefaciens isolates collection (Ozer et al., 2014). Identification of accessory genomic sequences in the different B. amyloliquefaciens isolates genomes was performed using Agent (Ozer et al., 2014).

Species Status of B. amyloliquefaciens
In total, 48 strains of the species (submitted until December 2016) have been selected for genome mining. Their genome size varied between 3.60 and 7.60 mega base pairs (MB) ( Table 1). GGDC analysis revealed the presence of three species lumped together in the strains collection sensu Meier- Kolthoff et al. (2013), where 70 % similarity between two genomes was established as the gold standard threshold for species boundaries (Figure 1A), ANI analysis revealed also three putative species sensu Richter and Rosselló-Móra (2009), where 95-96% cut-off was set up to delimit species boundaries. In both analysis, a set of 10 strains represented probably the "true" B. amyloliquefaciens species termed "B. amyloliquefaciens sensu stricto" while a set of 37 strains matched B. velezensis and a single isolate represented new species, yet to be described (Figures 1A,B). The proposed threshold for species discrimination (70%) clearly delimit species boundaries because strain pairs were found to be between 50 and 70% GGDC distance. GGDC values plotted against ANI values ( Figure 1C) showed agreement between the two technologies for species discrimination and no discontinuity in the graph could be observed. Finally, whole genome phylogeny confirmed results using GGDC and ANI values, with three sister branches representing the three species (Figure 1D).

Bioinformatic Evaluation of Plant Growth Promotion Potential of B. amyloliquefaciens Strains
Bioinformatic evaluation of plant growth promotion potential of B. amyloliquefaciens strains collection has been performed through homology-based mining of genes contributing to plantbeneficial functions. As unambiguously shown in Figure 2, large majority of B. amyloliquefaciens strains show presence of mined genes independently of whether these strains are represented by a complete coverage of the genome or their association to plant rhizosphere.

Secondary Metabolites from B. amyloliquefaciens
Secondary metabolite clusters present in the genome of the B. amyloliquefaciens collection have been evaluated using antiSMASH 3.0 (Weber et al., 2015), prediction informatics for secondary metabolomes (PRISM) (Skinnider et al., 2015), NapDos (Ziemert et al., 2012), NP.search (Li et al., 2009), and the bacteriocin specific software BAGEL3 (Van Heel et al., 2013). As shown in Figure 3 and Supplementary Table S1, different strains showed high levels of diverse secondary metabolite clusters using all implied programs. Rarefaction analysis of secondary metabolite clusters from the results of genome sequencing progress clearly attested that saturation could not be reached using all genome collection analyzed ( Figure 3B). A very clear correlation between genome size and number of gene clusters known to be involved in secondary metabolite biosynthesis and mined by antiSMASH was found. Approximately 65% of the variance in the number of secondary metabolite clusters can be explained by genome size (Figure 3C). However, for PRISM only 41% of the variance in the number of secondary metabolite clusters can be explained by genome size (Figure 3D).

Genomes to Natural Products Prediction in B. amyloliquefaciens
Natural products prediction in the core genome and the accessory genomes of the B. amyloliquefaciens collection revealed high numbers of unknown secondary metabolites across the strains analyzed ( Figure 4A and Supplementary Table S1). Only bacillibactin could be found in all the strains and in the core genome of B. amyloliquefaciens ( Figure 4A). All remaining known secondary metabolites such as surfactin, difficidin, fengycin, macrolactin, bacillaene, bacilysin, and mersacidin are harbored by the accessory genome of the different strains. Only 3% of the variance in the number of secondary metabolite clusters can be explained by accessory genome size ( Figure 4B).

Species Status of B. amyloliquefaciens
Given the high phenotypic similarity of B. amyloliquefaciens to B. subtilis and other closely related Bacillus spp. such as B. velezensis, it is not possible to distinguish these organisms solely on the basis of conventional assays (Dunlap et al., 2015 and. Sequencing of 16S rRNA gene, while has historically been used in defining bacterial taxonomy and phylogeny, proved difficult and controversial that lead to well-documented misidentifications (Hahnke et al., 2016). Therefore, considerable taxonomic confusion blurs biotechnological applications of this highly relevant group. Recently, genome based approaches such as Average Nucleotide Identity (ANI) and digital DNA-DNA hybridization (DDH) calculated using the Genome-to-Genome Distance Calculation (GGDC) complemented with genome comparisons, alignments and phylogenetic reconstructions have been suggested as alternative methods for species discrimination (Goris et al., 2007;Richter and Rosselló-Móra, 2009;Meier-Kolthoff et al., 2013). Using these accurate tools, several later heterotypic synonyms were documented in this group such as B. methyltrophicus, B. amyloliquefaciens subsp. plantarum, and B. oryzicola that have been shown, using phylogenomics, later heterotypic synonyms of B. velezensis (Dunlap et al., 2016). Therefore, phylogenomic approaches are urgently required to resolve outstanding problems in the phylogenetic systematics of the B. subtilis group (Dunlap et al., 2016). Phylogenomic analysis of all sequenced genomes of B. amyloliquefaciens strains available in GenBank, the National Centre for Biotechnology Information (NCBI) database (Table 1), allowed us to check taxonomic validity of these isolates, determine the extent of inter-species genome variability within B. amyloliquefaciens and reconstruct their phylogenetic relationships. Figures 1A-D clearly showed that at least three Bacillus spp. were lumped under the name B. amyloliquefaciens along with B. amyloliquefaciens sensu stricto. While isolates DC12, EBL11, EGD-AQ14, JRS8, HB26, JRS5, LX-11, 11B91, 12B, 629, B1895, B4140, Bs006, H57, Jxnuwx-1, LPL-K103, M49, RHNK22, TF28, UASWS BA1, Y2, IT-45, CECT 8238, CC178, LFB112, CECT 8237, KHG 19, L-H15, L-S60, MBE 1283 velezensis in ANI and GGDC analysis (data not shown), JJC33M failed to match known species and should be described as a new species. Bacillus isolates CMW1, B425, TA208, LL3, XH7, DSM7, RD7-7, SRCM101266, SRCM101294, and K2, should therefore be regarded as B. amyloliquefaciens sensu stricto. Phylogenomic tree based on the core genome of all isolates of B. amyloliquefaciens showed consistent results with earlier observations using either ANI or GGDC values. Our findings suggest that despite the pivotal role of microbial taxonomy in industrial exploitation of microbes and their products, classification and accurate identification have often been a neglected task. We recommend inclusion of phylogenomic studies as a prerequisite gold standard to the use of the name B. amyloliquefaciens in new reports.

Bioinformatic Evaluation of Plant Growth Promoting Potential of B. amyloliquefaciens Strains
Genome mining of the different strains of B. amyloliquefaciens allowed the discovery of numerous features documented in earlier studies as efficient factors of the interaction between host plants and the associated B. amyloliquefaciens strains (Niazi et al., 2014;Zhang, N. et al., 2016). These features allow nutrient acquisition, PGPR fitness, root colonization and growth promotion factors, plant growth promoting traits (hormones), plant protection from oxidative stress, plant induction of disease resistance, antibiotics and related compounds, resistance to drugs and heavy metals and degradation of aromatic compounds (Bruto et al., 2014;Niazi et al., 2014;Chen et al., 2016;Zhang, N. et al., 2016;Rekik et al., 2017). All these features were present in approximately all the genomes analyzed independently of whether these strains are represented by a complete coverage of the genome or their association to the plant rhizosphere. All these features could be also found in the core genome of the B. amyloliquefaciens sensu-stricto or the three-conserved species core genome. We speculate that plant growth promoting features could be considered as evolutional traits for adaptation to plant-associated habitats as suggested by Zhang, N. et al. (2016).

Secondary Metabolites from B. amyloliquefaciens
Bacillus amyloliquefaciens strains proved a prolific source of diverse secondary metabolite classes including polyketides (PKs) such as macrolactins and difficidins, peptides such as bacteriocins, lanthipeptides such as cerecidins, and lipopeptides (LPs) such as surfactins and iturins (Cimermancic et al., 2014;Wang et al., 2014;Aleti et al., 2015). PKs and LPs are the key inhibitors of plant pathogens and strains bearing these metabolites have been widely used in agriculture (Cochrane and Vederas, 2014). Despite the exponential increase of the number of B. amyliquefaciens genomes sequenced and the description of efficient analysis tools for secondary metabolite prediction, cursory investigation of these genome's wealth is available for describing the novelties and predicting uncharacterized metabolites (Aleti et al., 2015). In our study using recently described bioinformatic tools designed for the identification of clusters involved in secondary metabolism such as PRISM (Skinnider et al., 2015), antiSMASH 3.0 (Weber et al., 2015), NapDos (Ziemert et al., 2012), NP.searcher (Li et al., 2009), and the bacteriocin specific software BAGEL3 (Van Heel et al., 2013) and the B. amyloliquefaciens genomes available in databases, we documented high structural and functional diversity of secondary products in the species and their underlying gene clusters. Our data clearly showed high variety of secondary metabolites suggested by the high number of matches using five different programs for their prediction.
Rarefaction analysis of secondary metabolite clusters from the results of genome sequencing progress demonstrated clearly that saturation could not be reached using all genomes available and more sequencing effort of new strains is necessary to tackle the wide diversity of secondary metabolites potentially harbored by the species. This result confirmed the observations of Alenezi et al. (2016b) using the genus Aneurinibacillus. A very clear correlation between genome size and number of gene clusters known to be involved in secondary metabolite biosynthesis and mined by antiSMASH and PRISM was found. About 65% of the variance in the number of secondary metabolite clusters can be explained by genome size for antiSMASH for instance. This confirmed the results established by Jeske et al. (2013) while contrasted those conducted by Machado et al. (2015) and Alenezi et al. (2016b).

Genomes to Natural Products Prediction in B. amyloliquefaciens
Genome mining was also used to predict uncharacterized gene clusters and evaluate their potential to produce new yet to be characterized secondary metabolites. We found that while few known secondary metabolites such as surfactin, difficidin, bacilysin, fengycin, macrolactin, bacuillaene, and bacillibactin were identified, hundreds of secondary products still await for accurate molecular identification and the assignment of subsequent biological function. Similar finding has been reported by Jeske et al. (2013), Machado et al. (2015), and Alenezi et al. (2016b). Dynamics of evolution of the clusters was also investigated using comparative genomics across all known core and accessory genomes of B. amyloliquefaciens strains. Our findings unambiguously suggested that except bacillomycin, all remaining known or unknown secondary metabolites were harbored by the strains specific accessory genomes. This finding highlights the extraordinary potential offered by these plants associated Bacillus spp.

SUMMARY AND OUTLOOK
Our findings clearly suggest plant growth promoting features as evolutional traits for adaptation of B. amyloliquefaciens sensu lato to plant-associated habitats. They also document large repertoire of secondary metabolites harbored by a dynamic accessory genome that warrants more genome sequencing efforts of B. amyloliquefaciens sensu lato in order to shed the light on the wealth of these natural products offered by these bacteria.

ETHICS STATEMENT
This research did not involve any work with human participants or animals by any of the authors.

FUNDING
This project was funded by the Government of Kuwait (to FA) and the European Union Seventh Framework Programme under grant agreement 245268 (ISEFOR; to LB). Further support came from the SwissBOL project, financed by the Swiss Federal Office for the Environment (grant holder LB) and the Sciex-Scientific Exchange Program (https://www.swissuniversities.ch/en/topics/ sciex/) (NMS.CH; to LL and LB). LL and EP are indebted to the Ministry of Education, Science, Research and Sport of the Slovak Republic for financial support in the frame of the project "VEGA 1/0046/16." Part of the study was financially supported by the Life Plus project HESOFF, Life 11 ENV/PL/459 financed by the European Union and the National Fund for Environmental Protection and Water Management in Warsaw (Grant to TO).

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01438/full#supplementary-material