- 1Laboratory of Applied Biochemistry and Immunology, University Joseph KI-ZERBO, Ouagadougou, Burkina Faso
- 2National Key Laboratory of Agricultural Microbiology, Key Laboratory of Environment Correlative Dietology, College of Food Science and Technology, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, China
- 3College of Food Science and Technology, Zhejiang University of Technology, Hangzhou, China
Experimental studies, though often very costly, lead to the discovery of known antimicrobial products. Yet, pathogenic microorganisms are proving increasingly resistant to pre-existing antimicrobial molecules, and this is a cause for worldwide concern. Therefore, it is necessary to search for new molecules that could serve as alternatives in the food, medical and agricultural sectors. Thus, 123 complete genomes of Bacillus strains isolated from soil and fermented foods were analyzed and annotated using bioinformatics prediction and characterization tools. The view was to discover new gene clusters for the biosynthesis of non-ribosomal peptides (lipopeptides, siderophores, antibiotics). This study revealed that 83% of the genomes analyzed possess biosynthetic gene clusters for the production of siderophore bacillibactin, 61% for surfactins, 37% for fengycins, 23% for iturins, 15% for kurstakins and 3% for bacitracin. Besides, seven new biosynthetic gene clusters coding Non Ribosomal Peptide Synthetases (NRPS) have been identified in B. velezensis ATR2, B. velezensis DSYZ, B. velezensis CGMCC11640, B. amyloliquefaciens HM618, B. amyloliquefaciens WF02, B. cereus CMCC P0011, B. cereus CMCC P0021, B. subtilis SJ-10 and B. anthracis CMF9. The results of this study revealed a significant potential of the genus Bacillus to produce new non-ribosomally synthesized peptides. Now, these predicted new antimicrobial molecules can be easily studied experimentally as many new gene clusters have been identified.
Highlights
• Seven novel biosynthetic gene clusters that encode Non-Ribosomal Peptide Synthetases (NRPS) were discovered.
• A strain of Bacillus can potentially co-produce 4 families of lipopeptides.
• Bacillus genus has a significant potential to develop new non-ribosomally produced peptides.
• Genome mining strategy makes it possible to discover new metabolites.
1 Introduction
Bacillus are Gram-positive bacteria belonging to the firmicute genus and Bacillaceae family. They are spore-forming, aerobic or facultative aero-anaerobic and get energy by respiration or fermentation (Tamang et al., 2016). Their sporulating ability allows them to resist adverse environmental conditions. They are found in water, soil, dust, plants, food and in animals digestive tract. Some species of the genus Bacillus have important roles as antibiotics or antifungals producers (Khurana et al., 2020). For example, locillomycins and surfactin from B. subtilis 916 show antibacterial activity against Xanthomonas oryzae and Fusarium oxysporum (Zhao et al., 2018). Nowadays, 788 species from the genus Bacillus are sequenced, leading to various studies and explorations (www.ncbi.nlm.nih.gov/genome/?term=Bacillus, consulted 05/07/2024).
Indeed, many Bacillus strains have the ability to produce a wide variety of extracellular enzymes and nonribosomal secondary metabolites such as lipopeptides, the bacillibactin siderophore or antimicrobial compounds such as bacitracin which is a semi-cyclic peptide antibiotic commercialized as a mixture of polypeptides for the treatment of Gram-positive bacterial infections (May et al., 2001; Tapi et al., 2010). Other peptides, such as the lipopeptides have antimicrobial properties and can be used in food or soil biopreservation (Zhao et al., 2018).
Bacillus lipopeptides are subdivided into five families (fengycin, iturin, surfactin, kurstakin, and locillomycin) in which various subfamilies and variants can be found (Lam et al., 2021; Waongo et al., 2023). They are produced by modular mega-enzymes or assembly lines referred to as NRPSs (Iqbal et al., 2021). The NRPSs are composed of modules responsible for the incorporation of amino acids in the final nonribosomal peptide (Théatre et al., 2021). The modules contain the following main catalytic domains: the adenylation domain (A), responsible for the amino acid selection and activation, the carrier protein (CP), responsible for tethering the amino acid to the enzyme, and the condensation domain (C), forming a peptide bond between two amino acids attached to two consecutive modules (Leclère et al., 2016). Some NRPSs also include optional domains that can modify the incorporated amino acids during the synthesis, as the Epimerisation domain (E) leading to the D-isomery (Duban et al., 2022). The enzymatic domains work step by step to assemble the monomers into the peptide, so that NRPSs can be compared to assembly lines. A particular trait of NRPSs synthesising Lipopeptides (LP), except those belonging to the iturin and locillomycin families, is the presence of a condensation domain starting the assembly line. The role of this so-called C-starter is to condense the fatty acid into the first amino acid of the peptide chain. The fengycin family includes decapeptides fengycin A, B, C, and plipastatins A, B and S. The iturin family includes bacillomycins (D, DC, F, L and Lc), mycosubtilin, mojavensin A, subtulene A, mixirin and iturins (A, AL and C), all containing 7 amino acids. Heptapeptides are also found in the surfactin family and variant forms of lichenysin, pumilacidin, esperin and kurstakin. The fifth family contains the nonapeptides locillomycins A, B, and C (Luo et al., 2015a; Waongo et al., 2023).
Many drugs remain effective; however, the emergence of resistant organisms has become a major concern, drawing global attention and promoting the “One Health” concept (Aslam et al., 2021). This concept highlights the close links between human, animal, and environmental health. In this context, nonribosomal secondary metabolites produced by the genus Bacillus can be a source of interesting active compounds. Indeed, some antimicrobial peptides have been shown to be antibiotics with a broad spectrum against pathogens and lipopeptides and siderophores, which are not antibiotics but represent an alternative for food preservation and plant protection against phytopathogenic fungi (Huan et al., 2020).
An urgent challenge today is to discover new natural drugs to tackle emerging human, animal and plant pathogens. The identification of these biomolecules is usually performed by experimental studies including isolation, characterization, purification and activity tests (Fanaei et al., 2021). With the development of bioinformatics tools, this approach can now be accelerated and complemented by sequencing and subsequent genome or metagenome mining to identify natural product biosynthetic pathways (Blin et al., 2021).
Regarding nonribosomal peptide synthesis, specific tools have been developed to identify biosynthetic gene clusters in the genome sequences, to decipher the organization of NRPS into modules and domains, and to predict the nature of the monomers activated by each Adenylation-domain and their isomery in regard to the presence of epimerization domains (Blin et al., 2023). Notably, bacterial genome mining via in silico analysis, using the program antiSMASH, offers an attractive opportunity to discover new secondary metabolites such as NRPs (Rahman et al., 2014; Aleti et al., 2015). AntiSMASH 7 allows better visualization of enzyme assembly chain and good structuring of predicted molecules. Predicted peptides are likely to be characterized by comparison with other peptides available in Norine database (Flissi et al., 2020). The questions addressed here are (i) to review the structural and functional information together with annotation of gene clusters for known lipopeptides (LPs), antibiotics and siderophores produced by the genus Bacillus and (ii) to elucidate through genome mining the potential products of yet uncharacterized nonribosomal gene clusters. In addition, a bibliographical search was carried out to better understand the lipopeptides produced by Bacillus. The objective of this study is to combine literature exploration and genome analysis to identify and characterize the diversity of nonribosomal peptides as antibiotics, siderophores, and lipopeptides, which are potentially produced by Bacillus strains isolated from fermented foods and soil.
2 Materials and methods
2.1 An overview of known structures of nonribosomal lipopeptides produced by Bacillus strains
An overview of known structures of nonribosomal lipopeptides generated by Bacillus strains was performed. In brief, Scopus, PubMed, Web of Science, and ResearchGate databases were queried on October 2nd, 2024 using the keywords “lipopeptides AND Bacillus, lipopeptides AND structure, surfactin AND Bacillus, surfactin AND structure, fengycin AND Bacillus, Fengycin AND structure, kurstakin AND Bacillus, Kurstakin AND structure, locillomycin AND Bacillus, locillomycin AND structure, iturin AND Bacillus, iturin AND structure, New lipopeptides AND Bacillus, New lipopeptides Bacillus AND structure, non-ribosomal peptide AND Bacillus.”
2.2 Database search for genome sequences
Sequences (including chromosome and plasmid) of Bacillus isolated from soil and fermented foods were retrieved from the NCBI nucleotide database.1 A total of 123 complete genomes from Bacillus strains isolated from soil and fermented food samples, were selected according to the quality (Table 1). Thus, information on assembly, genome size, number of contigs/scaffolds and N50 were examined.
2.3 Detection of nonribosomal peptide biosynthetic gene clusters (BGCs)
The search for nonribosomal peptide sequences from the genome sequences of 123 Bacillus strains was performed following a workflow previously described (Leclère et al., 2016). First, the prediction of biosynthetic gene clusters (BGCs) was performed using antiSMASH version 7 (Blin et al., 2023). The regions encoding nonribosomal synthetases were further analyzed. The monomer composition of the predicted peptide was compared to all known NRPs annotated in the Norine database2 (Flissi et al., 2020) in order to identify them or predict them as a new peptide or new variant. When it was incomplete with truncated or fragmented clusters, we further investigated antiSMASH results to reconstruct partial or complete BGCs by assembling cluster fragments scattered in the shotgun genome. The number of genes, the order of modules, and the domains in the NRPSs and the predictions of the A-domain specificity helped identify clusters of known NRP families, even when they were fragmented.
Since poor sequence assembly can affect the quality of predicted metabolites, only high-quality complete genomes were considered in this study.
3 Results
Out of 123 Bacillus chromosomes selected, 115 were found to carry BGC NRPSs. Among these NRPSs, those responsible for the synthesis of a siderophore (102), antibiotics (4) or lipopeptides (93) were found.
3.1 Prediction of NRP siderophore
The screened sequences harbor genetic potential to produce the siderophore bacillibactin. The gene cluster is composed of three modules with A-domain specificity for 2,3-dihydroxybenzoate (Dhb or diOH-Bz in Norine), glycine and threonine. The corresponding NRPS is known to follow an iterative mode of biosynthesis, which could lead to the production of active iron-chelating compounds named bacillibactins. This small cluster was predicted in 83% of the studied genomes.
3.2 Overview of lipopeptide BGCs produced by Bacillus
The literature search revealed diverse non-ribosomal lipopeptides synthesized by the genus Bacillus. The structures of the different molecules are shown in Table 2.
3.3 Prediction of the lipopeptide structures
Within BGCs detected by antiSMASH, special attention was paid to traits specific to LP synthesis. Thus, a condensation domain allowing the incorporation of fatty acid (FA) (so-called C-starter) was sought. C-starter is usually present in LP NRPS except for LP belonging to iturin and locillomycin families (Figure 1). As up to now, no bioinformatics tool enables the prediction of fatty acid (FA) structure, the work was focused on peptide moieties (Table 3).

Figure 1. Organization of the known non-ribosomal peptide synthetases (NRPS) encoding lipopeptides, siderophore bacillibactin and antibiotic bacitracin in Bacillus. Iterative domains: A, adenylation domain; C, condensation domain; CP, Carrier Protein; E, epimerization domain; TE, thioesterase domain; mal, malonyl-CoA; CAL, Co-enzyme A ligase; KS, ketosynthetase domain; AmT, aminotransferase. Organization of the known NRPSs: (a) iturin A in B. amyloliquefaciens S499, (b) known kurstakin, (c) surfactin in B. velezensis UD6-2, (d) lichenysin A in B. paralicheniformis CP47, (e) bacillibactin in B. aerophilus KJ82, (f) fengycin B in B. paralicheniformis CP47, (g) bacitracin in B. paralicheniformis CP47, (h) bacillomycin D in B. velezensis DMB06, (i) mycosubtilin in B. subtilis T30, (j) locillomycin in B. subtilis 916.

Table 3. Structures of predicted and identified lipopeptides among different genomic sequences of Bacillus strains.
In silico analysis of the genomic sequences revealed the potential production of four lipopeptide families (Table 2). These are the surfactin, iturin, fengycin and kurstakin families depending on the strains. Over the 123 chromosomes analyzed, 75 (61%) possessed gene clusters responsible for surfactin production, 29 (23%) for iturin, 46 (37%) for fengycin and 19 (15%) for kurstakin (Table 3). No locillomycin synthetic gene cluster was detected in the genomes explored in this study. All chromosomes carrying kurstakin BGCs came from species belonging to the Bacillus cereus group. When this cluster was present, the other clusters (surfactin, fengycin, iturin) were absent. A kurstakin variant was detected in Bacillus mycoîde BGSC4BQ1 (Table 3, Figure 2). A variation was found in monomer 6, which is a threonine instead of Gln or Glu, which are most often found in this position. In addition, 25 chromosomes contained only clusters of surfactin genes. No chromosomes contained only fengycin or iturin gene clusters. A co-existence of gene clusters was also observed. As far surfactin and fengycin gene clusters are concerned, they were found in 21 chromosomes, while surfactin and iturin gene clusters were found in 4 chromosomes. A total of 25 chromosomes contained surfactin, fengycin and iturin gene clusters (Figure 3). In addition, six chromosomes contained four gene clusters (surfactin, fengycin, iturin, and novel gene clusters). All B. subtilis, B. velezensis, B. licheniformis, B. paralicheniformis, B. amlyloliquefaciens, B. pumilus, B. cellulasensis, B. altitudinis, B. spizizenii, B. safensis, B. vallismortis, B. halotolerans, B.inaquosorum, B. stratosphericus, B. aerophilus and B. atrophaeus have gene clusters to produce a known surfactin or variant. In addition, all B. licheniformis and B. paralicheniformis have gene clusters which produce lichenysin A (Table 3). All 29 iturin-producing strains belong to the B. subtilis group (B. subtilis, B. velezensis, B. amyloliquefaciens, B. spizizenii, B. atrophaeus). Gene clusters for the biosynthesis of bacillomycin D and bacillomycin L, variants of iturin were detected in the chromosomes of B. subtilis, B. amyloliquefaciens and B. velezensis species. In species such as B. atrophaeus and B. spizizenii, chromosomes possess gene clusters producing iturin A and mycosubtilin, respectively. For the fengycin family, A or B production gene clusters were detected in 1/3 of the chromosomes of producing strains. In fact, the 6th monomer, which may be Val in the case of fengycin B or Ala in the case of fengycin A, was not predicted during the study. A fengycin B variant was detected in the genomes of 10 strains belonging to the B. subtilis species (Table 3). However, all the genomes of B. paralicheniformis strains contained clusters of fengycin B biosynthesis genes (Table 3). The D-allothreonine observed in fengycin B is predicted to be a D-threonine by antiSMASH because these two threonine isomers can be selected by the same A-domain. A small difference is noticed in the beta carbon. Beta carbon of D-allothreonine (CαD, CβD) belongs to the D series whereas the beta carbon of D-threonine (CαD, CβL) belongs to the L series.

Figure 2. Organization of the predicted novel NRPSs and NRPS/PKS. Iterative domains: A, adenylation domain; C, condensation domain; CP, Carrier Protein; E, epimerization domain; TE, thioesterase domain; AT, acetyltransferase domain; KS, ketosynthetase domain; KR, ketoreductase domain; CAL, Coenzyme A ligase; C surrounded in light blue, condensation domain allowing condensation between a PKS monomer and a NRPS monomer. Predicted amino acid specificity is shown under each A domain. (a) a tetrapeptide in B. cereus CMCCP0011 (plasmid) and B. cereus CMCCP021 (plasmid); (b) a hexapeptide in B. velezensis ATR2 (chromosome); (c) a heptapeptide in B. velezensis DSYZ (chromosome), B. subtilis SJ-10 (chromosome) and B. velezensis CGMCC 11640 (chromosome); (d) a heptapeptide in B. amyloliquefaciens HM618 (chromosome); (e) a octapeptide in B. cereus CMCC P0011 (plasmid) and B. cereus CMCC P0021 (plasmid); (f) a heptapeptide in B. anthracis CMF9 (chromosome); (g) a hexapeptide in B. amyloliquefaciens WF02 (chromosome), B. velezensis CGMCC 11640 (chromosome) and B. velezensis DSYZ (chromosome); (h) predicted variant of kurstakin in B. mycoide BGSC 4BQ1, the difference with the other kurstakin predicted concerning the composition amino acids is located at the level of monomer 6; (i) predicted variant of fengycin B in B. vallismortis DSM 11031, B. subtilis UD1022, B. subtilis FUA2231, B. subtilis FUA2232, B. subtilis SRCM103517 and B. subtilis ZD01 genomes. Modular architecture is similar to the known fengycin B, the difference with fengycin B regarding the composition in amino acids is located at the level monomer 8.

Figure 3. Prevalence and coexistence of lipopeptide biosynthesis gene clusters in 123 chromosomes analysis from Bacillus (Venn diagram, n = 75).
3.4 Prevalence of BGC NRPSs in chromosomes of the different species
The 123 strains represent 33 species with distributions as follows: 19 species were found in the soil samples only, 12 species were found in both soil and fermented food isolation media and 2 species were found only in the fermented food. Bacillus species that were only found in soil samples included B. albus, B. arachidis, B. atrophaeus, B. badius, B. bombysepticus, B. cellulasensis, B. gobiensis, B. haikouensis, B. halotolerans, B. methanolicus, B. mycoides, B. inaquosorum, B. stratosphericus, B. thuringiensis, B. toyoninsis, B. tropicus, B. anthracis, B. vallismortis and B. aerophilus (Figure 4A). Species found in soil and fermented foods were: B. licheniformis, B. paralicheniformis, B. paranthracis, B. pacificus, B. pumilus, B. safensis, B. spizizenii, B. subtilis, B. velezensis, B. wiedmannii, B. amyloliquefaciens and B. cereus. Two species, B. altitudinis and B. infantis, were only found in fermented foods (Figure 4B). Among genomes of soil Bacillus species, only B. atrophaeus had a gene cluster for iturin biosynthesis. Furthermore, only the genome of B. thuringiensis species carried a gene clusters for the biosynthesis of the antibiotic bacitracin and a new gene cluster. Analysis of genomes of the 12 species found in isolation media (soil and fermented food) revealed the presence of new gene clusters in species such as B. velezensis, B. amyloliquefaciens, B. subtilis and B. cereus. Prevalence of new gene clusters was higher in B. velezensis and B. amyloliquefaciens, respectively (Figure 4B) and the genomes of these two strains carried 3 new BGC NRPSs. The genome of B. infantis, a species that was only present in fermented foods had no NRPSs gene cluster (Figure 4B). Moreover, bacitracin biosynthesis genes were carried only by the genomes of B. paralicheniformis and B. thuringiensis. Genomes of B. pacificus, B. cereus, B. wiedmannii and B. paranthracis lacked surfactin biosynthesis genes (Figure 4).

Figure 4. Production capacity of known BGC NRPSs and new gene clusters depending on the species of the genus Bacillus. Strains in (A) are only isolated from soil samples, but in (B), apart from B. infantis and B. altitudinis which are only present in fermented food samples, the other strains can be found in soil or fermented foods. These two, which are only present in fermented food samples, are shown with asterisk in (B).
3.5 Bacitracin production
Among the 123 genomes screened, only 4 (B. paralicheniformis 14DA11, B. paralicheniformis CP47, B. paralicheniformis UBBLI-30, and B. thuringiensis Bt185) bear BGC corresponding to the synthetic pathway for antibiotic bacitracin. This antibiotic is a semi-cyclic peptide constituted of 12 amino acids. In B. paralicheniformis CP47 chromosome sequence, a BGC encoding an NRPS containing 12 modules was found. These 12 modules correspond exactly to the 12 modules of bacitracin synthetase. In the genomes of B. thuringiensis Bt185, B. paralicheniformis 14DA11, and B. paralicheniformis UBBLI-30, modular structuring and amino acid composition were similar to bacitracin A1 (Figure 1), similarities ranged from 85 to 100%.
3.6 BGC potentially producing new metabolites
A total of 7 new molecules carried by 7 new gene clusters were detected in this study, based on the exploration of 123 complete genomes of strains belonging to the Bacillus genus and isolated either from fermented foods or from soil. These newly identified gene clusters would be capable of producing other new lipopeptide families and new antibiotics. All newly predicted molecules were first compared with those identified subsequently, and then with the non-ribosomal peptides available in the Norine database. Low similarities ranged from 28.6 to 50%. The first new gene cluster detected consisted of 3 genes and 4 modules incorporating Thr, X, Asn and Asn monomers, respectively (Figure 2). Amino acid incorporated by module 2 was not detected, hence the letter X at this position (Figure 2). The predicted tetrapeptide showed a low similarity of 39.4% to Cis-7-tetradecenoyl-D-Asparagine. This new gene cluster was carried by plasmids from B. cereus CMCC P0011 and B. cereus CMCC P0021 strains. Two new gene clusters consisting of 6 modules each were detected by exploring the genomes of B. velezensis ATR2, B. velezensis DSYZ, B. velezensis CGMCC 11640 and B. amyloliquefaciens WF02. The hexapeptide predicted in the B. velezensis ATR2 genome was structured as follows: D-Cys-Ser-Cys-Ala-X-D-Asn, and contained a CAL (Co-enzyme A ligase) domain. The hexapeptide predicted in the genomes of B. velezensis DSYZ, B. velezensis CGMCC 11640 and B. amyloliquefaciens WF02 was structured: Val-D-Phe-Asp-D-Asn-Gly-D-Glu. Formation is ensured by the combination of an NRPS system and a PKS (polyketide synthase) system (Figure 2). CAL and C-starter domains were not detected in this hexapeptide. Therefore, it is probably an antibiotic. Literature searches and structural analyses in Norine did not reveal any similar molecules.
In silico analyses also revealed the presence of 3 new NRPS clusters capable of producing 3 different heptapeptides depending on the modular organization and amino acid composition of each. The first predicted heptapeptide detected was carried by B. velezensis DSYZ and B. subtilis SJ-10 genomes and is structured as follows: D-Cys-Ser-Cys-Ala-X-Asn-D-Asn (Figure 2). Monomer 5 has not been predicted. A CAL domain was detected at the start of the peptide chain (Figure 2). Structural analysis of this heptapeptide in Norine showed little similarity to iturin A1 (32.3%). Furthermore, literature searches revealed no molecules with a similar structure. For the last two predicted heptapeptides, each gene cluster consists of 7 modules incorporating 7 amino acids. One had the following structure: D-Phe-D-Leu-Phe-D-Thr-Val-Ala-Thr and was carried by the B. amyloliquefaciens HM618 genome while the other had the following structural architecture: D-Phe-Tyr-Ile-X-D-Phe-Leu-Leu and was carried by the B. anthracis CMF9 genome (Figure 2). Their structures were similar to kahalalide A (39.4%) and axinastatin 5 (40%) respectively. The heptapeptide identified in the B. amyloliquefaciens HM618 genome contained a thioesterase (TE) domain marking the end of the peptide chain. The heptapeptide identified in the B. anthracis genome contained a CAL domain at the start of the peptide chain and a C condensation domain at the end of module 7 (Figure 2).
Exploration of the plasmid sequence of B. cereus CMCCPOO11 and B. cereus CMCCPOO21 revealed a BGC probably responsible for the synthesis of a new octapeptide nonribosomal peptide (Figure 2). The chemical structure of unpredictable monomers are represented by the letter X in the peptide chain. The structure search on Norine showed little similarity (50%) with cyanostatin B. No molecules with a similar structure were found in the literature.
4 Discussion
Bacillus species are known to produce a large variety of secondary metabolites. This production is influenced by environmental conditions. Thus, this study explored 123 complete genomes of Bacillus isolated from soil and fermented foods. Thus, in silico analysis of chromosomal and plasmid sequences using bio-informatics tools specific to non-ribosomal peptides revealed a potential gene cluster responsible for the biosynthesis of lipopeptides (sufactin, fengycin, iturin, kurstakin), the antibiotic bacitracin and the siderophore bacillibactin. Only B. subtilis 916 (Luo et al., 2015a) produced locillomycin, which was not detected in this study. However, surfactins are lipoheptapeptides with variants such as esperin, lichenysin, pumilacidin and surfactin (Waongo et al., 2023). Two distinct amino acids were predicted to be incorporated in the first module: either a glutamate (Glu) or a glutamic acid (Gln) (Hu et al., 2019). As for iturins, they are lipoheptapeptides whose main variants are: iturins A and C, bacillomycins D, F and L, mycosubtilin and mojavensin (Waongo et al., 2023). Bacillibactin is a siderophore whose sequence is Dhb-Gly-Thr (May et al., 2001). As a result, the modular organization and monomer composition of predicted peptides from the surfactin and iturin groups as well as bacillibactin, were similar to literature data. Compared with fengycin, the lipodecapeptide predicted in genomes of B. subtilis UD1022, B. vallismortis DSM11031, B. subtilis ZD01, B. halotolerans ZB201702, B. mojavensis B-41812, B. mojavensis B-41341, B. subtilis MEC_B298, B. subtilis s-16, B. subtilis FUA2231, B. subtilis FUA2232, B. subtilis SRCM103517, and B. subtilis BSP1, contains at position 8 a glutamic acid (Glu). In contrast, various authors have reported that module 8 incorporates glutamine (Gln) during peptide chain formation (Ait Kaki et al., 2020; Hussein, 2019). This observed variability shows that certain Bacillus could produce a fengycin B variant. Also, structural analysis of predicted kurstakins shows the presence of a threonine (Thr) at position 6 in the B. mycoide BGSC 4BQ1 genome. Generally, the amino acid occupying this position in the case of kurstakin is glutamine (Gln) (Béchet et al., 2012). Although monomers 2 and 6 have not been predicted in the genomes of other B. cereus group species (B. thuringiensis, B. cereus, B. wiedmannii, B. bombysepticus), it is quite possible that B. mycoide BGSC 4BQ1 would be capable of producing a kurstakin variant. Nevertheless, the two putative variants identified through this in silico screening approach will need to be further investigated through in vitro experiments.
This study revealed the coexistence of lipopeptide biosynthesis genes. Indeed, Luo et al. (2015b) reported the coexistence of surfactin (sfr), bacillomycin (bmy), fengycin (fen) and locillomycin (Loc) gene clusters in B. subtilis 916 genome. In this study, B. amyloliquefaciens WF02, B. amyloliquefaciens HM618, B. velezensis ATR2, B. velezensis CGMCC 11640, B. velezensis Lzh-a42, B. subtilis SJ-10 and B. velezensis DSYZ genomes also contained three known lipopeptide biosynthesis gene clusters and novel gene cluster that could produce a new lipopeptide family. To our knowledge, no study has demonstrated the coexistence of four or five gene clusters of lipopeptide in genomes of species such as B. amyloliquefaciens and B. velezensis. According to Luo et al. (2015b), a co-production of surfactin, fengycin, iturin and locillomycin by B. subtilis 916 is at the origin of its inhibitory capacity against multi-resistant Staphylococcus aureus. Similarly, the co-production of lipopeptides would reduce hemolytic activity of the producing strains (Luo et al., 2019; Waongo et al., 2023).
The emergence of multi-resistant strains to commonly used antibiotics is a major challenge (Eduardo-Correia et al., 2020). The exploration of new molecules that could serve as alternatives remains a necessity. Thus, the study identified seven new gene clusters (NRPS and NRPS/PKS), two of which were carried by plasmid sequences. These new gene clusters were responsible for the biosynthesis of a tetrapeptide, two hexapeptides, three heptapeptides and octapeptide. The predicted tetrapeptide and octapeptide were synthesized by gene clusters carried by plasmids. The structural architecture and monomer composition of all predicted molecules differed from the peptides available in the database (Flissi et al., 2023). Furthermore, literature searches did not reveal any molecules with similar structure and monomer composition. Except for the hexapeptide predicted in the chromosomes of B. velezensis DSYZ and B. amyloliquefaciens WF02, all the others carried a CAL or C-starter domain in the first module. In fact, lipopeptides are made up of chains of amino acids and fatty acids. Their structure is characterized by the presence of C-starter and CAL domains, which play a role in fatty acid chain activation (Ongena and Jacques, 2007). Thus, their presence in the first module of the seven predicted new peptides means that they could belong to the lipopeptide family. However, certain antibiotics of purely peptidic clinical interest, such as vancomycin and bacitracin, are characterized by the presence of an adenylation domain (A) in the first module (Konz et al., 1997; Wageningen et al., 1998). This A domain is responsible for selecting and activating the first amino acid to be integrated into the growing peptide chain. Consequently, the new hexapeptide predicted in the chromosomes of B. velezensis DSYZ and B. amyloliquefaciens WF02 would be a peptide antibiotic. Esmaeel et al. (2016) reported the synthesis of a new lipopeptide by analyzing the genomes of genus Burkholderia. On the other hand, since lipopeptides are synthesized non-ribosomally, the presence of gene clusters responsible for their synthesis in plasmids is rare. According to the literature, no study has revealed the presence of NRPS lipopeptide synthase genes on a Bacillus plasmid, only on the Burkholderia plasmid (Esmaeel et al., 2016). As a result, the putative new BGCs would constitute new families of lipopeptides as well as new antibiotics, which could have unique and interesting biological properties (e.g., antifungal, antibacterial, antiviral). Biomolecules newly predicted in this study could be at the center of current research in the interest of their future use in several fields such as agri-food, cosmetics and medicine. The results of this study reveal that the genomes of Bacillus strains available in databases contain many unknown molecules that could play important roles in antimicrobial control. However, it would be interesting to elucidate the biological functions of all these new molecules. Although the bioinformatic discovery of novel molecules often captures attention, rigorous experimental validation remains the crucial step in translating promising ideas into tangible scientific findings and ensuring genuine progress.
5 Conclusion
Exploration of microorganisms genomes allows for the rapid identification of gene clusters located on chromosomes coding for new beneficial molecules. Thus, in addition to chromosomes, it would be interesting to analyze plasmids that could have BGC clusters of NRPs or lipopeptides. Then, this study allowed us to identify seven new gene clusters synthesizing new non-ribosomal peptides. Our results suggest that several NRPs capable of being produced by Bacillus strains isolated from fermented foods and soil are still uncharacterized and their properties still unknown. Therefore, targeted research should be conducted experimentally to validate these different predicted molecules.
Data availability statement
Genome mining work was done base on the whole genome sequences of 123 Bacillus strains is listed in Table 1, which can be obtained from NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore). All the other data supporting the findings is contained within the manuscript.
Ethics statement
The manuscript presents research on animals that do not require ethical approval for their study.
Author contributions
BW: Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing. LN: Investigation, Methodology, Writing – review & editing. FT: Methodology, Writing – review & editing. W-SAZ: Methodology, Writing – review & editing. JL: Supervision, Writing – review & editing. AS: Supervision, Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research and/or publication of this article.
Acknowledgments
We would like to thank University Joseph KI-ZERBO of Burkina Faso and Huazhong Agricultural University of China for their financial support.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Gen AI was used in the creation of this manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
References
Ait Kaki, A., Smargiasso, N., Ongena, M., Kara Ali, M., Moula, N., de Pauw, E., et al. (2020). Characterization of new fengycin cyclic lipopeptide variants produced by Bacillus amyloliquefaciens (ET) originating from a salt lake of eastern Algeria. Curr. Microbiol. 77, 443–451. doi: 10.1007/s00284-019-01855-w
Aleti, G., Sessitsch, A., and Brader, G. (2015). Genome mining: prediction of lipopeptides and polyketides from Bacillus and related Firmicutes. Comput. Struct. Biotechnol. J. 13, 192–203. doi: 10.1016/j.csbj.2015.03.003
Aslam, B., Khurshid, M., Arshad, M. I., Muzammil, S., Rasool, M., Yasmeen, N., et al. (2021). Antibiotic resistance: one health one world outlook. Front. Cell. Infect. Microbiol. 11, 1–20. doi: 10.3389/fcimb.2021.771510
Béchet, M., Caradec, T., Hussein, W., Abderrahmani, A., Chollet, M., Leclére, V., et al. (2012). Structure, biosynthesis, and properties of kurstakins, nonribosomal lipopeptides from Bacillus spp. Appl. Microbiol. Biotechnol. 95, 593–600. doi: 10.1007/s00253-012-4181-2
Blin, K., Shaw, S., Augustijn, H. E., Reitz, Z. L., Biermann, F., Alanjary, M., et al. (2023). antiSMASH 7.0:new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res. 51, 46–50. doi: 10.1093/nar/gkad344
Blin, K., Shaw, S., Kloosterman, A. M., Charlop-Powers, Z., Van Wezel, G. P., Medema, M. H., et al. (2021). AntiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35. doi: 10.1093/nar/gkab335
Bonmatin, J.-M., Laprevote, O., and Peypoux, F. (2003). Diversity among microbial cyclic Lipopeptides: Iturins and Surfactins. Activity-structure relationships to design new bioactive agents. Comb. Chem. High Throughput Screen. 6, 541–556. doi: 10.2174/138620703106298716
Duban, M., Cociancich, S., and Leclère, V. (2022). Nonribosomal peptide synthesis definitely working out of the rules. Microorganisms. 10, 1–19. doi: 10.3390/microorganisms10030577
Eduardo-Correia, B., Morales-Filloy, H., and Abad, J. P. (2020). Bacteria from the multi-contaminated Tinto river estuary (SW, Spain) show high multi-resistance to antibiotics and point to Paenibacillus spp. as antibiotic-resistance-dissemination players. Front. Microbiol. 10, 1–18. doi: 10.3389/fmicb.2019.03071
Esmaeel, Q., Pupin, M., Kieu, N. P., Chataigné, G., Béchet, M., Deravel, J., et al. (2016). Burkholderia genome mining for nonribosomal peptide synthetases reveals a great potential for novel siderophores and lipopeptides synthesis. MicrobiologyOpen. 5, 512–526. doi: 10.1002/mbo3.347
Fanaei, M., Jurcic, K., and Emtiazi, G. (2021). Detection of simultaneous production of kurstakin, fengycin and surfactin lipopeptides in Bacillus mojavensis using a novel gel-based method and MALDI-TOF spectrometry. World J. Microbiol. Biotechnol. 37, 1–11. doi: 10.1007/s11274-021-03064-9
Flissi, A., Duban, M., Jacques, P., Leclère, V., and Pupin, M. (2023). Norine: bioinformatics methods and tools for the characterization of newly discovered nonribosomal peptides A. Flissi, M. Duban, P. Jacques, V. Leclère, and M. Pupin Methods Mol. Biol. 2670, 303–318. doi: 10.1007/978-1-0716-3214-7_16
Flissi, A., Ricart, E., Campart, C., Chevalier, M., Dufresne, Y., Michalik, J., et al. (2020). Norine: update of the nonribosomal peptide resource. Nucleic Acids Res. 48, 465–469. doi: 10.1093/nar/gkz1000
Hathout, Y., Ho, Y., Ryzhov, V., Demirev, P., and Fenselau, C. (2000). Kurstakins: a new class of lipopeptides isolated from Bacillus thuringiensis. J. Nat. Prod. 63, 1492–1496. doi: 10.1021/np000169q
Hu, F., Liu, Y., and Li, S. (2019). Rational strain improvement for surfactin production: enhancing the yield and generating novel structures. Microb. Cell Factories. 18, 1–13. doi: 10.1186/s12934-019-1089-x
Huan, Y., Kong, Q., Mou, H., and Yi, H. (2020). Antimicrobial peptides: classification, design, application and research progress in multiple fields. Front. Microbiol. 11, 1–21. doi: 10.3389/fmicb.2020.582779
Hussein, W. (2019). Fengycin or plipastatin? A confusing question in bacilli. Biotechnologia. 100, 47–55. doi: 10.5114/bta.2019.83211
Iqbal, S., Ullah, N., and Janjua, H. A. (2021). In vitro evaluation and genome mining of Bacillus subtilis strain RS10 reveals its biocontrol and plant growth-promoting potential. Agriculture. 11:1273. doi: 10.3390/agriculture11121273
Jin, P., Wang, H., Liu, W., Fan, Y., and Miao, W. (2017). A new cyclic lipopeptide isolated from Bacillus amyloliquefaciens HAB-2 and safety evaluation. Pestic. Biochem. Physiol. 147, 40–45. doi: 10.1016/j.pestbp.2017.08.015
Kato, T., and Shoji, J. (1980). The structure of octapeptin D (studies on antibiotics from the genus Bacillus. XXVIII). J. Antibiot. 33, 186–191. doi: 10.7164/antibiotics.33.186
Khurana, H., Sharma, M., Verma, H., Lopes, B. S., Lal, R., and Negi, R. K. (2020). Genomic insights into the phylogeny of Bacillus strains and elucidation of their secondary metabolic potential. Genomics. 112, 3191–3200. doi: 10.1016/j.ygeno.2020.06.005
Konz, D., Klens, A., Schörgendorfer, K., and Marahiel, M. A. (1997). The bacitracin biosynthesis operon of Bacillus licheniformis ATCC 10716: molecular characterization of three multi-modular peptide synthetases. Chem. Biol. 4, 927–937. doi: 10.1016/S1074-5521(97)90301-X
Lam, V. B., Meyer, T., Arias, A. A., Ongena, M., Oni, F. E., and Höfte, M. (2021). Bacillus cyclic lipopeptides iturin and fengycin control rice blast caused by Pyricularia oryzae in potting and acid sulfate soils by direct antagonism and induced systemic resistance. Microorganisms 9, 1–25. doi: 10.3390/microorganisms9071441
Leclère, V., Weber, T., Jacques, P., and Pupin, M. (2016). Bioinformatics tools for the discovery of new nonribosomal peptides. Methods Mol. Biol. 1401, 209–232. doi: 10.1007/978-1-4939-3375-4_14
Lee, S. C., Kim, S. H., Park, I. H., Chung, S. Y., and Choi, Y. L. (2007). Isolation and structural analysis of bamylocin a, novel lipopeptide from Bacillus amyloliquefaciens LP03 having antagonistic and crude oil-emulsifying activity. Arch. Microbiol. 188, 307–312. doi: 10.1007/s00203-007-0250-9
Liu, R., Zhang, D., Li, Y., Tao, L., and Tian, L. (2010). A new antifungal cyclic lipopeptide from Bacillus marinus B-9987. Helv. Chim. Acta. 93, 2419–2425. doi: 10.1002/hlca.201000094
Luo, C., Chen, Y., Liu, X., Wang, X., Wang, X., Li, X., et al. (2019). Engineered biosynthesis of cyclic lipopeptide locillomycins in surrogate host Bacillus velezensis FZB42 and derivative strains enhance antibacterial activity. Appl. Microbiol. Biotechnol. 103, 4467–4481. doi: 10.1007/s00253-019-09784-1
Luo, C., Liu, X., Zhou, X., Guo, J., Truong, J., Wang, X., et al. (2015a). Unusual biosynthesis and structure of locillomycins from Bacillus subtilis 916. Appl. Environ. Microbiol. 81, 6601–6609. doi: 10.1128/AEM.01639-15
Luo, C., Liu, X., Zhou, H., Wang, X., and Chen, Z. (2015b). Nonribosomal peptide synthase gene clusters for lipopeptide biosynthesis in Bacillus subtilis 916 and their phenotypic functions. Appl. Environ. Microbiol. 81, 422–431. doi: 10.1128/AEM.02921-14
Ma, Z., Wang, N., Hu, J., and Wang, S. (2012). Isolation and characterization of a new iturinic lipopeptide, mojavensin a produced by a marine-derived bacterium Bacillus mojavensis B0621A. J. Antibiot. 65, 317–322. doi: 10.1038/ja.2012.19
May, J. J., Wendrich, T. M., and Marahiel, M. A. (2001). The dhb operon of Bacillus subtilis encodes the biosynthetic template for the catecholic siderophore 2, 3-dihydroxybenzoate-glycine-threonine trimeric ester bacillibactin *. J. Biol. Chem. 276, 7209–7217. doi: 10.1074/jbc.M009140200
Meyers, E., Parker, W. L., and Brown, W. E. (1976). A nomenclature proposal for the octapeptin antibiotics. J. Antibiot. 29, 1241–1242. doi: 10.7164/antibiotics.29.1241
Naruse, N., Tenmyo, O., Kobaru, S., Kamei, H., Miyaki, T., Konishi, M., et al. (1990). Pumilacidin, a complex of new antiviral antibiotics: production, isolation, chemical properties, structure and biological activity. J. Antibiot. 43, 267–280. doi: 10.7164/antibiotics.43.267
Nishikiori, T., Naganawa, H., Muraoka, Y., Aoyagi, T., and Umezawa, H. (1986). Plipastatins: new inhibitors of phospholipase A2, produced by Bacillus cereus BMG302-fF67. J. Antibiot. 39, 755–761. doi: 10.7164/antibiotics.39.755
Ongena, M., and Jacques, F. (2007). Bacillus lipopeptides: versatile weapons for plant disease biocontrol. Trends Microbiol. 16, 115–125. doi: 10.1016/j.tim.2007.12.009
Pathak, K. V., and Keharia, H. (2014). Identification of surfactins and iturins produced by potent fungal antagonist, Bacillus subtilis K1 isolated from aerial roots of banyan (Ficus benghalensis) tree using mass spectrometry. 3 Biotech 4, 283–295. doi: 10.1007/s13205-013-0151-3
Peypoux, F., Marion, D., Maget-Dana, R., Ptak, M., Das, B. C., and Michel, G. (1985). Structure of bacillomycin F, a new peptidolipid antibiotic of the iturin group. Eur. J. Biochem. 153, 335–340. doi: 10.1111/j.1432-1033.1985.tb09307.x
Peypoux, F., Pommier, M. T., Marion, D., Ptak, M., Das, B. C., and Michel, G. (1986). Revised structure of mycosubtilin, a peptidolipid antibiotic from Bacillus subtilis. J. Antibiot. 39, 636–641. doi: 10.7164/antibiotics.39.636
Rahman, M. A., Noore, M. S., Hasan, M. A., Ullah, M. R., Rahman, M. H., Hossain, M. A., et al. (2014). Identification of potential drug targets by subtractive genome analysis of Bacillus anthracis A0248: an in silico approach. Comput. Biol. Chem. 52, 66–72. doi: 10.1016/j.compbiolchem.2014.09.005
Sang-cheol, L., Kim, S., Park, I., Chung, S., Chandra, M. S., and Choi, Y. (2010). Isolation, purification, and characterization of novel fengycin S from Bacillus amyloliquefaciens LSC04 degrading-crude oil. Biotechnol. Bioprocess Eng. 15, 246–253. doi: 10.1007/s12257-009-0037-8
Shoji, J., and Kato, T. (1976). The structure of cerexin B (studies of antibiotics from the genus Bacillus. XVII). J. Antibiot. 29, 1275–1280.
Shoji, J., Kato, T., Matsumoto, K., Takahashi, Y., and Mayama, M. (1976a). Production and isolation of cerexins C and D (studies on antibiotics from the genus Bacillus. XVIII). J. Antibiot. 29, 1281–1285. doi: 10.7164/antibiotics.29.1268
Shoji, J., Kato, T., and Sakazaki, R. (1976b). The total structure of cerexin a (studies on antibiotics from the genus Bacillus. XVI). J. Antibiot. 29, 1268–1274. doi: 10.7164/antibiotics.29.1275
Tamang, J. P., Shin, D.-H., Jung, S. J., and Chae, S.-W. (2016). Functional properties of microorganisms in fermented foods. Front. Microbiol. 7, 1–13. doi: 10.3389/fmicb.2016.00578
Tapi, A., Chollet, M., Scherens, B., and Jacques, P. (2010). New approach for the detection of non-ribosomal peptide synthetase genes in Bacillus strains by polymerase chain reaction. Appl. Microbiol. Biotechnol. 85, 1521–1531. doi: 10.1007/s00253-009-2176-4
Thasana, N., Prapagdee, B., Rangkadilok, N., Sallabhan, R., Aye, S. L., Ruchirawat, S., et al. (2010). Bacillus subtilis SSE4 produces subtulene a, a new lipopeptide antibiotic possessing an unusual C15 unsaturated β-amino acid. FEBS Lett. 584, 3209–3214. doi: 10.1016/j.febslet.2010.06.005
Théatre, A., Cano-Prieto, C., Bartolini, M., Laurin, Y., Deleu, M., Niehren, J., et al. (2021). The surfactin-like lipopeptides from Bacillus spp.: natural biodiversity and synthetic biology for a broader application range. Front. Bioeng. Biotechnol. 9:623701. doi: 10.3389/fbioe.2021.623701
Thomas, D. W., and Ito, T. (1969). The revised structure of the peptide antibiotic esperin, established by mass spectrometry. Tetrahedron. 25, 1985–1990. doi: 10.1016/S0040-4020(01)82819-2
Vater, J., Kablitz, B., Wilde, C., Franke, P., Mehta, N., and Cameotra, S. S. (2002). Matrix-assisted laser desorption ionization-time of flight mass spectrometry of lipopeptide biosurfactants in whole cells and culture filtrates of Bacillus subtilis C-1 isolated from petroleum sludge. Appl. Environ. Microbiol. 68, 6210–6219. doi: 10.1128/AEM.68.12.6210-6219.2002
Volpon, L., Tsan, P., Majer, Z., Vass, E., Hollósi, M., Noguéra, V., et al. (2007). NMR structure determination of a synthetic analogue of bacillomycin Lc reveals the strategic role of l-Asn1 in the natural iturinic antibiotics. Spectrochim. Acta A Mol. Biomol. Spectrosc. 67, 1374–1381. doi: 10.1016/j.saa.2006.10.027
Wageningen, A. A., Kirkpatrick, P. N., Williams, D. H., Harris, B. R., Kershaw, J. K., Lennard, N. J., et al. (1998). Sequencing and analysis of genes involved in the biosynthesis of a vancomycin group antibiotic. Chem. Biol. 5, 155–162. doi: 10.1016/S1074-5521(98)90060-6
Waongo, B., Pupin, M., Duban, M., Chataigne, G., Zongo, O., Cisse, H., et al. (2023). Kawal: a fermented food as a source of Bacillus strain producing antimicrobial peptides. Sci. Afr. 20:e01714. doi: 10.1016/j.sciaf.2023.e01714
Zhang, H. L., Hua, H. M., Pei, Y. H., and Yao, X. S. (2004). Three new cytotoxic cyclic acylpeptides from marine Bacillus sp. Chem. Pharm. Bull. 52, 1029–1030. doi: 10.1248/cpb.52.1029
Keywords: Bacillus, nonribosomal peptide, lipopeptide, biosynthetic gene clusters, genome mining
Citation: Waongo B, Ndayishimiye L, Tapsoba F, Zongo W-SA, Li J and Savadogo A (2025) Prospection for potential new non-ribosomal peptide gene clusters in Bacillus genus isolated from fermented foods and soil through genome mining. Front. Microbiol. 16:1515483. doi: 10.3389/fmicb.2025.1515483
Edited by:
Jørgen J. Leisner, University of Copenhagen, DenmarkReviewed by:
Maria Carla Martini, Worcester Polytechnic Institute, United StatesSajid Iqbal, Oujiang Laboratory (Zhejiang Laboratory for Regenerative Medicine, Vision, and Brain Health), China
Copyright © 2025 Waongo, Ndayishimiye, Tapsoba, Zongo, Li and Savadogo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Blaise Waongo, d2FuZ2JsYWlzZTg1QGdtYWlsLmNvbQ==
†ORCID: François Tapsoba, https://orcid.org/0000-0001-6964-0442