Elucidating the genomic history of commercially used Bacillus thuringiensis subsp. tenebrionis strain NB176

Bacillus thuringiensis subsp. tenebrionis (Btt) produces a coleopteran-specific crystal protoxin protein (Cry3Aa δ-endotoxin). After its discovery in 1982, the strain NB125 (DSM 5526) was eventually registered in 1990 to control the Colorado potato beetle (Leptinotarsa decemlineata). Gamma-irradiation of NB125 resulted in strain NB176-1 (DSM 5480) that exhibited higher cry3Aa production and became the active ingredient of the plant protection product Novodor® FC. Here, we report a comparative genome analysis of the parental strain NB125, its derivative NB176-1 and the current commercial production strain NB176. The entire genome sequences of the parental and derivative strains were deciphered by a hybrid de novo approach using short (Illumina) and long (Nanopore) read sequencing techniques. Genome assembly revealed a chromosome of 5.4 to 5.6 Mbp and six plasmids with a size range from 14.9 to 250.5 kbp for each strain. The major differences among the original NB125 and the derivative strains NB176-1 and NB176 were an additional copy of the cry3Aa gene, which translocated to another plasmid as well as a chromosomal deletion (~ 178 kbp) in NB176. The assembled genome sequences were further analyzed in silico for the presence of virulence and antimicrobial resistance (AMR) genes.

1 Introduction epithelial cell membrane, leading to intestinal perforation, feeding stop and finally insect death (Schnepf et al., 1998;Bravo et al., 2007). The d-endotoxins are encoded by cry and/or cyt genes, typically located on large plasmids (Carlton, 1980;Kronstad et al., 1983;Carlson et al., 1996). With the discovery of Bt strains specific to Lepidoptera, Diptera or Coleoptera, many Bt strains have been successfully developed and used as highly specific biological control agents (Bravo et al., 2007).
Soon after its discovery, several Btt based plant protection products were placed on the market in the US, Canada, and Europe (Gelernter, 2004). One of these commercial products was Novodor ® , containing the production strain NB125 (DSM 5526), a subculture of BI 256-82, which was registered in 1990 to control the Colorado potato beetle (Leptinotarsa decemlineata) (Gelernter, 2004). By Gamma-irradiation of NB125, NB176-1 (DSM 5480) was produced, which differed from NB125 by (i) oligosporogeny, (ii) a more than twofold increase of Cry3Aa protein production and crystals up to more than five times bigger than the crystals produced by NB125, and hence an increased insecticidal activity (Adams et al., 1994;Gurtler and Petersen, 1994). Based on cloning of the cry3Aa gene and Southern hybridization it had been proposed that a transposition-mediated duplication of the cry3Aa gene caused the Cry3Aa overproduction (Adams et al., 1994). The radiation-derived NB176-1 (DSM 5480) became eventually the new basis of the commercial product Novodor ® FC.
Here, we report the full genomes of the three Btt strains NB125 (DSM 5526), NB176-1 (DSM 5480) and the current production strain NB176, formulated in the plant protection product Novodor ® FC. The sequences were obtained by applying a hybrid de novo assembly approach using a combination of short (Illumina) and long (Nanopore) sequencing reads (Wick et al., 2017b). Comparisons of the three genomes allowed following and understanding the genomic rearrangements that had taken place from the former (parental strain before radiation) to the current production strain used as biocontrol agent (European Food Safety Authority, 2013).

Bacterial strains and cultivation
The parental NB125 (DSM 5526) and radiation-derived strain NB176-1 (DSM 5480) were obtained as freeze-dried cultures from the DSMZ (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany), whereas the current production strain NB176 (formulation Novodor ® FC) was provided as glycerol stock (vegetative cells in 1 ml glycerol) by Valent BioSciences (Libertyville, Illinois, United States) (Table 1).
After receiving the three different Btt strains, they were always handled under sterile conditions to avoid any contamination. Glassware and culture media were autoclaved prior to use and all steps were performed under a sterile workbench. The freeze-dried strains NB125 and NB176-1 were received in tightly sealed glass vials and were handled according to the protocol of the DSMZ. In brief, the glass vials were broken open and the dried strains were mixed with 0.5 ml LB medium (1% tryptone, 0.5% yeast extract, 0.5% NaCl, pH 7.0 ± 0.2; Carl Roth, Karlsruhe, Germany). After 30 min rehydration, 0.25 ml of the suspension were used to inoculate 5 ml liquid LB medium. To recover NB176, the frozen glycerol stock was stabbed using a sterile loop and 5 ml liquid LB medium were directly inoculated. All bacterial cultures were grown at 30°C with shaking at 200 rpm overnight.

Total genomic DNA extraction and whole genome sequencing
Starting from an aliquot of 1 ml overnight culture, total genomic (chromosomal + plasmid) DNA (gDNA) was extracted using the Wizard ® Genomic DNA Purification Kit (Promega, Madison, WI, USA), following the protocol for gram-positive bacteria. For cell lysis, the bacteria were treated with lytic enzyme in a final concentration of 2 mg/ml lysozyme (Sigma-Aldrich, St. Louis, Missouri, USA) for 60 min at 37°C. After DNA precipitation, the gDNA was dissolved in 100 ml dH 2 O and stored at -80°C until further use. The quality of gDNA was verified by measuring the absorbance ratios 260/280 and 260/230 using a Nanodrop 2000c spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Concentration of gDNA was determined with a Quantus Fluorometer (Promega, Madison, WI, USA) and integrity was confirmed by TAE gel electrophoresis (0.8% agarose).
The identical stocks of gDNA were used for both Illumina (short read) and Nanopore (long read) sequencing. Short-read genomic data was obtained from approximately 500 ng gDNA per sample using a 150 bp paired-end protocol on an Illumina NextSeq 2000 sequencer (StarSEQ GmbH, Mainz, Germany). The genome size was estimated with 6 Mbp and between 6 to 7 M reads were ordered to achieve an estimated 150 to 175-fold read depth. The quality of the raw sequencing data was validated with FastQC (v0.72) (Andrews, 2010) and adapter and quality trimming (Phred quality score ≥ 30; minimum read length after trimming = 20 nt) was performed using Trim Galore! (v0.6.3.) (Krueger, 2015).
For long-read sequencing, the MinION sequencer (Oxford Nanopore Technologies, Oxford, UK) was utilized. Extracted gDNA was concentrated to approximately 30 ng/µl by vacuum evaporation using an Eppendorf Concentrator plus (Eppendorf, Hamburg, Germany) with the following options: 30°C, brake = off and V-AQ. Multiplexing and library construction were performed using the Native Barcoding Expansion Kit (EXP-NBD104) and Ligation Sequencing Kit (SQK-LSK109), following the manufactures instructions with minor modifications: To increase DNA recovery, freshly prepared 80% ethanol was used throughout the protocol. Incubation times for gDNA binding to and elution from magnetic beads were extended to 15 min. All elution steps were conducted at 37°C. After adapter ligation, large DNA fragments longer than 3 kbp were enriched using the Long Fragment Buffer (LFB), provided by the kit. The prepared library, consisting of three barcoded DNA samples, was loaded and sequenced on a MinION R9.4.1 flow cell (Oxford Nanopore Technologies Ltd., Oxford, UK) for 68 hours to receive approximately 5 M reads (15.76 Gb). Basecalling, demultiplexing and adapter trimming of long nanopore reads were performed using Guppy (v4.3.4). Subsequently, reads were filtered on minimum read quality (Q = 10) and minimum read length (≥1000 nt) using NanoFilt (v2.3.0) (De Coster et al., 2018). For each sample, basic statistics of long read sequencing data before and after filtering were calculated with NanoStat (v1.1.2.) (De Coster et al., 2018).

Hybrid de novo genome assembly
For the assembly of the Btt genomes (chromosomes and plasmids) the Unicycler tool (v0.4.8) was used, a SPAdes algorithm based genome assembler with varying k-mer sizes (Wick et al., 2017b). Default parameters and normal bridging mode were selected as settings for the Unicycler, using the quality filtered and adapter trimmed short (Illumina) and long read (MinION) sequencing data. The assembled genomes were visualized with Bandage (v0.8.1.) (Wick et al., 2015) and were further evaluated using QUAST (v5.0.2.) (Gurevich et al., 2013). To resolve the entire genomic structure, manual multiplicity adjustments were required for both NB176-1 and NB176 due to large inter-plasmidic repeats. Briefly, the best SPAdes graphs obtained from the hybrid de novo assemblies were visualized in Bandage. Subsequently, regions in question (e.g. assembled linear plasmid contig) were BLAST searched in Bandage and multiplicity calls were inspected using Unicycler's graph color scheme (green = single-copy, yellow = two-copy, orange = three-copy, red = multicopy). In regions where mis-assemblies have occurred, the multiplicities were adjusted manually following the protocol of the Unicycler webpage (https://github.com/rrwick/Unicycler/wiki/ Multiplicity-mistake). The modified best SPAdes graph was then used as input for a follow-up hybrid assembly using Unicycler. For NB176 further manual curation was needed, since two plasmids collapsed into a shared repeat sequence, although both plasmids sequences were available in Unicycler's final assembly graph. To finish the assembly and separate the two plasmids based on read depth, contigs were extracted and merged according to their graph path. Each completed assembly was further polished with the integrated tool Pilon (v1.20) (Walker et al., 2014) using the paired, highly accurate Illumina short reads.

Evaluation of genome assemblies
The quality of the assembly was evaluated by mapping paired Illumina reads back to the de novo assembled genome using BWA-MEM (Li and Durbin, 2009;Li, 2013). Proportion of mapped and  (Adams et al., 1994;Gurtler and Petersen, 1994) NB176-1 (DSM 5480) Aug. 10, 1989 freeze-dried DSMZ (Adams et al., 1994;Gurtler and Petersen, 1994 unmapped reads was calculated for each strain using samtools (v1.9.). To examine the reads, which did not map to their respective assembly, unmapped paired-end reads were extracted from BWA-MEM output and de novo assembled into contigs. Taxonomy was assigned to each contig by blastn search against the nt NCBI database. Coverage and depth of mapping (mean ± standard deviation [SD]) were calculated for each replicon (chromosome and plasmids) within each strain. Coverage plots were visually inspected in Geneious Prime (v2021.2.2) (Biomatters Inc., San Diego, CA, USA) to detect ambiguous regions in the assembly indicating possible errors in the assembled sequences. Circularity of replicons was verified by selecting reads spanning from both contig ends.
To assess assembly completeness in terms of gene content, BUSCO (v5.2.2.; genome mode) (Simão et al., 2015) with lineagespecific ortholog set for Bacillales was used (bacillales_odb10, 2021-02-23). Missing and duplicated BUSCO genes were reported, as they may not be of technical but biological origin such as gene loss and gene duplication.

Multiple sequence alignment and verification of nucleotide deletions
Multiple sequence alignment (MSA) of NB125, NB176-1 and NB176 was performed using Mauve progressive algorithm (Darling et al., 2010) in Geneious Prime (v2021.2.2). Each replicon (bacterial chromosome, plasmids) was considered individually and compared between all strains carrying that replicon. After MSA, indels were left realigned manually to correct gaps introduced by the aligner. To compare replicon sequences, sequence identity was calculated. Here, sequence identity [%] was defined as the percentage of bases that are identical in an alignment. For bacterial chromosome comparison, alignment was performed twice: An additional sequence identity analysis was performed after excluding a chromosomal deletion from the alignment. To verify detected nucleotide deletions in genome assemblies, paired Illumina reads were mapped with BWA-MEM against the de novo assembled genome sequence of the parental NB125. Samtools depth (Galaxy version 1.9) was used to compute the read depth at each genomic position and the output file was loaded in R (v4.1.2.) using RStudio (v2021.09.2) to create a coverage plot for the corresponding replicon.

Genome annotation and homology analysis of predicted coding sequences
Automatic annotation of protein coding DNA sequences (CDS) and RNA genes, transfer RNA (tRNA), ribosomal RNA (rRNA) as well as transfer messenger RNA (tmRNA), was performed using Prokka software (v1.14.6, default settings) (Seemann, 2014). Ambiguities in the chromosome annotation between the three strains were identified by aligning the entire chromosome including annotations with Mauve progressive algorithm in Geneious. One CDS in NB176 was not annotated automatically although the complete sequence was available (100% sequence identity). To adjust the annotation, the CDS was added manually.
In the next step, annotated gene sets (GFF3) were used for homology analysis of CDS: Each replicon (bacterial chromosome, plasmids) was considered individually and the predicted CDS were compared between all strains carrying that replicon. To identify CDS homologies, the fully automated Roary software (v3.13, default settings, minimum percentage amino acid identity = 95%, splitparalogs, 100% core genes threshold) was used (Page et al., 2015). Occurring paralogous genes were analyzed individually (paralog splitting) to investigate not only sequence homologies but also their genomic position. The number of shared (= present in all three strains), variable (= present in two strains) and unique (= present in a single strain) CDS were determined. Following this computer based CDS comparison, detected homologies and CDS identified as variable or unique were verified manually. Gene presence/absence files created by Roary software were further analyzed in R (v4.1.2.) using RStudio (v2021.09.2).
Since all annotated features (CDS + RNA genes) were numbered consecutively by Prokka software, the location and annotation of RNA genes were also included in the final gene presence/absence files (Supplementary Tables 1-8). Homology of RNA genes was not investigated.

Analysis of CDS located on a chromosomal deletion
It was assessed if copies of the CDS located on a deleted stretch of chromosomal DNA can be found somewhere else in the corresponding genome. The set of deleted CDS was extracted and batch searched against the corresponding genome using MEGABLAST (default settings, i.e. max. E-value = 0.05, word size = 28) in Geneious Prime. MEGABLAST results were sorted into two bins (hit vs. no hit) and identified hits were batch searched again to obtain a hit table with nucleotide identity and coverage values. Only CDS that were found to cover at least 70% of the BLAST searched CDS were considered as present. The analysis was carried out at the nucleotide level, since there is no comparable MEGABLAST for protein searches available.
To classify CDS that were located on the deletion site into functional categories, corresponding Clusters of Orthologous Groups (COG) identifiers were used. COG numbers were extracted from Prokka annotations and subsequently searched in the NCBI COG database. The total number of CDS in each detected COG class was counted to investigate overrepresented COG categories. If a CDS was assigned to more than one COG category, a 1 was added to the count of each respective COG class.

In silico prediction of virulence and resistance genes and sequence typing
Assembled genomes were submitted to BTyper software (Galaxy version 2.0.3, default settings) (Carroll et al., 2017) to identify virulence genes. The software uses a search/comparison based approach (default cutoffs: minimum nucleotide identity = 50%, minimum query coverage = 70%) against the BTyper virulence database, containing 44 virulence genes specific for the B. cereus group.
Besides virulence gene prediction, BTyper was used for in silico multi-locus sequence typing (MLST), panC clade assignment, and rpoB allelic typing. To determine the genome position of all sequences detected by BTyper software, the corresponding reference amino acid (aa) or nucleotide (nt) sequences were downloaded and tblastn (aa) or blastn (nt) searched in Geneious Prime. Location of BLAST hits and corresponding annotation IDs of strain NB125 were reported. Homologous genes (and their annotation IDs) for NB176-1 and NB176 can be found in the corresponding annotation tables (Supplementary Tables 1, 8).

Phylogenetic analysis
The complete genome sequences of NB125, NB176-1 and NB176 were uploaded to the Type (Strain) Genome Server (TYGS) for a whole genome-based taxonomic analysis (Meier-Kolthoff and Göker, 2019). Closest type strain genomes were determined automatically (Supplementary Table 11). All pairwise comparisons between the three uploaded strains as well as between the three strains and closest related type strains were performed using the Genome Blast Distance Phylogeny (GBDP) approach.
The automatically determined closest type strain genomes were downloaded from NCBI and subsequently uploaded to the JSpeciesWS webserver (Richter et al., 2016) for ANIb (Average nucleotide identity algorithm using BLAST) calculation (Goris et al., 2007). Pairwise ANIb values between the selected type strain genomes and the three genomes under assessment were calculated. The average of the two reciprocal ANIb values was calculated to provide a single ANIb value for each genome pair.

Hybrid assemblies and evaluation
To resolve the entire genome of NB125, NB176-1 and NB176, a hybrid de novo assembly based on short and long read sequencing data was performed. This approach resulted in sufficient high-quality reads to allow the separation of chromosomal and plasmid sequences and their complete reconstruction (Tables 3, 4). The genome length was found to vary between 6 to 6.3 Mbp with an identical GC content of 35.1% (Table 3). The chromosomes were 5.4 to 5.6 Mbp long and in each strain, the chromosome was associated with six putative plasmids that were between 14.9 to 250.5 kbp in length (Tables 3, 4). The plasmid sequences were called putative plasmids (ppl) since their identification method was sequence based. In the process of whole genome reconstruction, the Unicycler tool resolved the genome of NB125 entirely, without any manual adjustment. The assembled contigs of NB125 chromosome (5.6 Mbp) as well as its six plasmids (250-ppl, 185-ppl, 137-ppl, 68-ppl, 43-ppl and 14-ppl) were circular and covalently closed (Tables 3, 4). For NB176-1 and NB176, the reconstruction was hampered by inter-plasmidic repeats and required manual adjustment by fixing ambiguous repeat regions with the help of multiplicities. Here, the chromosomes were found to be 5.6 and 5.4 Mbp long for NB176-1 and NB176, respectively ( Table 3). The plasmid set up for NB176-1 and NB176 was slightly different from NB125: 250-ppl, 185-ppl, 99-ppl, 68-ppl, 43-ppl and 14-ppl (Table 4).
To assess the quality of the genome assemblies, paired Illumina reads were mapped against their corresponding whole genome assembly. Here, 99.71%, 99.73% and 99.86% of all Illumina reads of NB125, NB176-1 and NB176 mapped, respectively. To investigate the origin of unaligned reads, unmapped, paired-end reads were extracted, de novo assembled into contigs and taxonomy was assignend to each contig by BLAST search against the NCBI database. For each NB125 and NB176-1, two contigs were de novo assembled from unmapped paired-end reads. These contigs were assigned to cattle chromosomes (Bos gaurus, Bos taurus), most likely due to culture media: Prior to freeze-drying, the two strains deposited at DSMZ were cultured on media containing beef-extract. For NB176 it was not possible to de novo assemble the unmapped, paired-end reads into contiguous sequences. Therefore, 0.14% unassigned reads were found for NB176.
Another evaluation step comprised the identification of 450 bacillales core genes using BUSCO software leading to a completeness score of 99.8% (98.7% single copy, 1.1% duplicated and 0.2% missing) for NB125, NB176-1 and NB176. In all three strains, the tryptophan repressor (BUSCO ID 148693at1385) was missing and five BUSCO genes were duplicated (Table 5). Both copies of four duplicated BUCSO genes were located on the chromosome (Table 5), whereas recA was located on the chromosome and on 43-ppl.

Sequence comparison
The chromosome of NB176-1 was 1003 bp longer than that of NB125, but they shared an identical GC content (35.3%, Table 3). The chromosome of NB176 was shorter (~178.5 kbp) than those of NB125 and NB176-1 and had a slightly higher GC content of 35.4% (Table 3). Multiple sequence alignment revealed a chromosomal deletion of 178,499 bp in NB176 ( Figure 1A). To verify the detected deletion, short reads of NB176 were mapped against the de novo assembled genome sequence of the parental NB125 where a drop to zero in read depth between 4,203,170 and 4,381,670 confirmed the detected deletion ( Figure 1B). MSA was also used to compare the three chromosomes at the nucleotide level: The nucleotide sequence identity of NB125, NB176-1, and NB176 was above 96% and increased to >99.95% after excluding the deletion site from the alignment (Supplementary Table 13).

Genome annotation
For the chromosome of NB125, NB176-1 and NB176, a total number of 5760, 5761 and 5579 annotations were predicted, respectively. The majority of 5610 (NB125), 5611 (NB176-1) and 5429 (NB176) annotations were identified as CDS, whereas the remaining 150 annotations were assigned to RNA genes (Table 6). A total number of 30, 56, 78, 109, and 168 CDS were predicted for 14-ppl, 43-ppl, 68-ppl, 99-ppl, and 137-ppl, respectively (Table 4 and Supplementary Tables 2-6). For 185-ppl a total number of 168 CDS was predicted for each strain, although nucleotide sequences were slightly different (Table 4 and Supplementary Tables 7, 12). Homology analysis of CDS (homology aa cutoff ≥95%) revealed that there were no differences in gene content. In addition, order and orientation of predicted CDS was identical for 185-ppl. A total number of 237 (NB125, NB176) and 236 (NB176-1) CDS as well as one tRNA gene was found for 250-ppl (Table 4 and Supplementary  Table 8). According to the computer-based comparison of CDS, 236 CDS were found to be homologous in all strains. In 250-ppl of NB176-1, a nucleotide difference caused the stop codon to be skipped in CDS ID 250-ppl/NB176-1 = 66. Thus, CDS no. 66 of NB176-1 spans the length of the two CDS no. 66 and no. 67 of NB125 and NB176 (ID 2 5 0 -p p l / N B 1 2 5 / N B 1 7 6 = 66 + 67, Supplementary Table 8).
Coding sequences of the three annotated chromosomes were compared using an automated approach based on Roary software with a homology criterion of ≥95% amino acid sequence identity. This automated method was verified manually and adjusted when required. Based on this approach, a set of 5428 chromosomal CDS was found to be homologous for all three strains NB125, NB176-1, and NB176 (=shared genes) (Figure 3). The main differences in chromosomal CDS were the absence of 182 CDS (ID NB125 = 4312-4493) in NB176 due to the chromosomal deletion of 178.5 kbp, and one additional IS110 family transposase ISBth166 gene (ID NB176-1 = 1492 and ID NB176 = 1492) that was present in NB176-1 and NB176, but not in the parental NB125 ( Figure 3). Initially, two CDS were reported to fail the homology criterion due to short nucleotide insertions/deletions (indels): The homologous CDS ID NB125 = 1544, ID NB176-1 = 1545 and ID NB176 = 1545 were located at the same relative position of the chromosomes. It was found that ID NB176-1 = 1545 differed from ID NB125 = 1544 and ID NB176 = 1545 by a 441 bp deletion resulting in the loss of 147 amino acids (aa) (aa identity = 78.1%) (Table 7). Similarly, the homologous CDS ID NB125 = 3195, ID NB176-1 = 3196 and ID NB176 = 3196 differed from each other that ID NB176-1 = 3196 and ID NB176 = 3196 had a 294 bp deletion resulting in the loss of 98 aa in the middle of the aa sequence, which reduced the overall aa identity to 92.3%. The remaining predicted aa of the CDS were identical between all strains (Table 7).  Tables 1, 3). *ID of plasmid 43-ppl. **One copy was located on the chromosome, and the other copy was located on 43-ppl plasmid. It was further assessed if the additional transposase gene (ID NB176-1 = 1492 and ID NB176 = 1492) occurs once, or whether it is an extra gene copy that occurs frequently in the bacterial chromosome. The results of a blastn search revealed that this gene occurs 13 times on the chromosome of the parental NB125 (8 copies with 100% nt identity and 100% coverage, and 5 copies with 97.6% nt identity/100% coverage), whereas it occurs 14 times on the chromosome of NB176-1 and NB176 (9 copies with 100% nt  Schematic presentation of the six plasmids of NB125, NB176-1, and NB176. Each circular plasmid is linearized and represented in scale by a rectangle. Same colors represent homologous regions. Plasmid 137-ppl has a unique region (red) and a region of 72,351 nt (pink) that is shared with 99-ppl (100% nt sequence identity). In addition to that shared region, a shorter part of 99-ppl (27,082 nt) is also found in 185-ppl (blue, 100% nt sequence identity), located on an inter-plasmidic repeat region with a total length of 28,743 nt. This duplicated region is carrying the cry3Aa gene, which thereby is present in two copies, on 99-ppl and 185-ppl, for NB176-1 and NB176, and only in one copy for NB125. Besides this recombination, a deletion of a single stretch of plasmid DNA (65,061 nt) compared to 137-ppl of the parental strain NB125 was detected in NB176-1 and NB176.
In the next step, the automatic annotation of all 182 CDS located on the deletion site, was examined. From these, a total of 93 CDS (~51.1%) were annotated as hypothetical proteins (Supplementary Table 9). Eighty-nine CDS (~48.9%) were annotated with the name of an encoded protein and 34 (~38.2%) of these annotations included Table 9). Sixty (~67.4%) out of 89 CDS were assigned to Cluster of Orthologous Groups (COG) identifiers (Supplementary Table 9). To classify these genes into functional categories, the corresponding COG identifiers were searched in the NCBI COG database. Genes located on the deletion were assigned to 16 functional groups (+ R = General function prediction only, + S = Function unknown) (Figure 4). After this functional assignment, the total number of CDS in each detected COG class was counted. Most genes were assigned to the COG category Amino acid transport and biosynthesis. As an example, the ABC transporter complex TcyABC involved in L-cystine import is located within the deletion site. The second largest COG category represented proteins with function in Defense mechanisms. Several genes associated with bacitracin resistance, e.g. a bacitracin efflux BceAB type ATP transporter were detected. In addition, some annotations without a COG identifier can be assigned to the function Defense mechanisms e.g. tetracycline resistance protein, class B; guanidinium exporter, and bacitracin transport ATPbinding protein BcrA (Supplementary Table 9). Further examples for CDS located at the deletion site were two putative ABC transporters YknXYZ (Cell wall/membrane/envelope biogenesis; Defense mechanisms), a Na(+)/H(+)-K(+) antiporter GerN associated with Shared, variable and unique chromosomal CDS detected for parental strain NB125 and its derived strains NB176-1 and NB176. In total, 5611 CDS were detected, of which 5428 were shared by all three strains. The number of chromosomal CDS predicted for each strain are given in parenthesis below the strain names. Numbers inside the Venn diagram represent unique, variable and shared CDS. The software based approach identified ID NB176-1 = 1545 and ID NB125 = 3195 as unique genes due to their deletions. Manual verification identified these genes as homologous to CDS with deletions, present in the other two Btt strains, as well. endospore germination (Inorganic ion transport and metabolism) and a licABC operon (Carbohydrate transport and metabolism), involved in the uptake and metabolism of lichenin degradation products (Supplementary Table 9). To analyze, whether the 182 located within the deletion site, can be found somewhere else in the genome of NB176 the set of deleted CDS was extracted from the parental genome NB125 and batch searched against the NB176 genome using MEGABLAST algorithm. From the 182 CDS located on the chromosomal deletion, twelve CDS were found with hits elsewhere in the genome (Table 8). Four CDS were found with one homologous copy: ID NB125 = 4334, 4336, 4412, and 4417. Except for ID NB125 = 4336, which is located on 99-ppl, the CDS were located on the chromosome. For these four CDS, the nucleotide identity and CDS coverage ranged between 73.8% to 92.3% and 76.8% to 100%, respectively (Table 8).

Enzyme Commission (EC) numbers (Supplementary
The remaining eight CDS had more than one homologous CDS copy on the genome. Copies were found on the chromosome, 99ppl, 185-ppl and 250-ppl. Except for ID NB125 = 4426 and 4475, CDS copies were found in duplicates on the chromosome and another two copies on 185-ppl and 250-ppl (Table 8). The nucleotide identity of the multi copy CDS varied between 93.9% to 100% and the coverage of the CDS was above >99.8% (Table 8). A block of six consecutive CDS (ID NB125 = 4420 to 4425) was found in the same order and orientation at four other locations in the genome of NB176. This block of genes occurred twice on the chromosome and once on 185-ppl and 250-ppl (Table 8). This repetitively occurring block of CDS carried BcrA and BcrC genes, both associated with bacitracin resistance.
In total, three out twelve reoccurring CDS were annotated as hypothetical proteins; four CDS were annotated as transposase genes, and three were associated with bacitracin resistance, In addition, one virulence gene associated with secretion and one gene annotated as a transcriptional regulatory protein was found to be located elsewhere in the genome (Table 8).

In silico prediction of virulence and antimicrobial resistance genes
The presence of 44 virulence genes specific for the B. cereus group was examined in silico by BTyper software (nucleotide identity ≥50%, coverage ≥70%). For NB125, NB176-1 and NB176 the same set of 23 virulence genes was predicted (Table 9) and the corresponding nucleotide sequences were identical. Some enterotoxins associated with diarrheal symptoms are expressed by operons: Both nhe (nheABC) and hbl (hblABCD) operon genes were identified in their entirety in the genomes of three analyzed strains (Table 9). However, the nhe promoter was disrupted by a transposase (ID NB125 = 3455, ID NB176-1 = 3456, ID NB176 = 3456) insertion in all three strains. Further, three putative enterotoxin genes encoding for the one-component proteins FM (entFM), T (BceT) and A (entA); two metalloprotease genes (inhA1, inhA2); and two genes (clo, hlyII) encoding for pore forming hemolysins (cerolysin O, hemolysin II) were detected. Three different p ho s p ho l i p a s e C g e n e s w e r e f o u n d ; p l c A e n c o d i n g phosphatidylinositol-specific phospholipase C (PI-PLC), cerA encoding phosphatidylcholine-specific phospholipase C (PC-PLC), and cerB encoding sphingomyelinase (SM-PLC). PC-PLC and SM-PLC are encoded in an operon (cerAB) and form the biological complex cereolysin AB. Two genes coding for regulatory proteins, hlyIIR, regulating hlyII, and the pleiotropic regulator plcR, controlling the expression of multiple virulence genes, were found. Capsule genes (hasA, cap) and genes associated with the production of anthrax toxin (pagA, lef, atxA, cya) were not detected. In addition, all strains were negative for ces as well as cytK.
Three out of nine bps genes belonging to the exo-polysaccharide operon bpsX-H were predicted. However, the sequence identities were relatively low (<70%) for two of these positive hits (bpsF, bpsH) and the corresponding CDS were located on different chromosomal loci. In general, predicted virulence genes were mostly located on the Cluster of Orthologous Groups (COG) analysis of CDS located within the deletion site of NB176. The number of CDS per COG category is given. A total of 182 predicted CDS were located on the chromosomal deletion in NB176. Eighty-nine CDS were annotated with the name of the encoded protein and sixty of these annotations included COG identifiers. These COG numbers were searched in the NCBI COG database to classify the 60 genes into 18 groups (16 functional groups + R = General function prediction only, + S = Function unknown). Due to assignments of CDS into multiple groups, 74 assignments were made. To investigate overrepresented COG categories, the total number of CDS in each detected COG class was counted. If a CDS was assigned to more than one COG category, a 1 was added to the count of each respective COG class.
bacterial chromosome, except for plcA. This gene was not only predicted for the chromosome, but also for the 250-ppl with a lower nt sequence identity (Table 9). To verify this predicted plasmid-encoded phospholipase C gene despite the relatively low nt sequence identity, the corresponding aa sequence of ID NB125/250-ppl = 147 was blastp searched in the NCBI database, revealing that the aa sequence is identical to a multispecies phosphatidylinositol-specific phospholipase C domain-containing protein (WP_086405655.1).
By using ABRicate and AMRFinderPlus software, the presence of antimicrobial resistance (AMR) genes was assessed in silico. For each of the three strains, the same AMR genes were predicted (Table 10) and the corresponding nucleotide sequences were identical. In total, 13 The corresponding IDs of NB176 are provided. In total, 12 out of 182 CDS were detected using MEGABLAST (default settings, i.e. max. E-value = 0.05, word size = 28), followed by postprocessing using a query coverage cutoff of 70%. 1 ID number belongs to the corresponding replicon. Further information is provided in the respective annotation tables of the bacterial chromosome (Supplementary  Nucleotide identity between NB125, NB176-1, and NB176 = 100%, 2 Amino acid sequence of cerA and plcB references are identical and were therefore reported together. 3 located on plasmid 250-ppl. Virulence gene typing revealed identical results for all strains tested (NB125, NB176-1, NB176) and were therefore reported only once. Amino acid (aa) identity and coverage percentages (id [%]/cov [%]) for each predicted virulence gene are given. To describe the genome position of all sequences detected by BTyper software, the corresponding reference aa sequences were downloaded and tblastn searched in Geneious Prime. Corresponding annotation numbers of NB125 are reported. Homologous genes (and their Prokka annotation numbers) for NB176-1 and NB176 can be found in the corresponding annotation table (Supplementary Tables 1, 8).
AMR genes were predicted for the bacterial chromosome (Table 10). These genes are associated with resistance to beta-lactam (bla, bla2), glycopeptide (vanZ-F), macrolide (mhpL, abc-f), lincosamidestreptogramin (lsa), streptogramin (vat), streptothricin (sat A), phenicol (catA) and fosfomycin (fosB) drug families. Three resistance genes, assigned to aminoglycoside class of antibiotics, were predicted for plasmid 250-ppl (Table 10). No resistance genes were predicted for plasmids 185-ppl, 137-ppl, 99-ppl, 68-ppl, 43-ppl, and 14-ppl.  In silico analysis performed by BTyper software confirmed that the three strains NB125, NB176-1 and NB176 belonged to the B. cereus group. The different typing methods (panC, MLST, rpoB) revealed identical results for each Btt strain (Table 11). All three strains were assigned to panC clade 4 and the same closestmatching B. cereus group genome was reported. Analysis of seven MLST genes showed, that all three strains shared an identical allelic profile and were therefore assigned to the same sequence type (ST=23). Based on the rpoB sequence of each strain, the same allele type (AT0296) was identified, which showed a high similarity (100%) to B. cereus s.l.

Phylogenetic analysis
Based on digital DNA-DNA hybridization (dDDH), twelve most closely related type strain genomes, all belonging to the Bacillus cereus group, were determined automatically using the TYGS server (Supplementary Table 14 and Supplementary Figure 1). The most closely related type strain genome of NB125, NB176-1, and NB176 was Bacillus thuringiensis ATCC 10792 with a dDDH value of 68.3%, 68.1%, and 68.1%, respectively, followed by Bacillus cereus ATCC 14579 with a dDDH value of 65.6%, 65.6% and 65.7%, respectively (Supplementary Table 14). All pairwise dDDH values between the three genomes under assessment and the type strain genomes were below the species delineation threshold of 70%, but the upper limit of the 95% confidence interval regarding  Nucleotide identity between NB125, NB176-1 and NB176 = 100%. * Stop codon within sequence hit, therefore two CDS cover sequence hit. ABRicate, including the ARG-ANNOT (2022-Jun-13, n = 1749), ResFinder (2022-Jun-13; n = 3144), CARD (2022-Jun-13; n = 2220), and NCBI AMRFinderPlus (2022-Jun-13; n = 6146) databases, was used with an 80% nucleotide identity and 70% coverage threshold. The tool AMRFinderPlus (database version 2022-05-26.1) was run in combined mode (nucleotide + protein) with default settings. Note that AMRFinderPlus uses curated BLAST and HMM cutoffs to optimize specificity and sensitivity of AMR detection. Therefore, results are reported, even if the identity and coverage values are below the default cutoff settings. If the same AMR gene was predicted using different algorithms and/or underlying databases, the results were combined into one row of the table. AMR gene typing revealed identical results for NB125, NB176-1, and NB176 and were therefore reported only once using the parental strain NB125 as reference. Homologous genes of NB176-1 and NB176 can be found in the corresponding annotation tables (Supplementary Tables 1, 8).
Bacillus thuringiensis ATCC 10792 was above 70%. According to dDDH, NB125, NB176-1, and NB176 are highly similar, with p a i r w i s e d D D H v a l u e s r a n g i n g fr o m 9 9 .9 t o 1 0 0 % (Supplementary Table 14), resulting in the same species (70% dDDH species threshold) as well as subspecies (79% dDDH subspecies threshold) clustering (Supplementary Table 14, Supplementary Figure 1). The overall high sequence similarity of NB125, NB176-1 and NB176 was further confirmed by pairwise ANIb values ranging from 99.89 to 99.99% (Supplementary  Table 15). Pairwise ANIb analysis of the three genome sequences and the closest type strains revealed that NB125, NB176-1 and NB176 showed ANIb values slightly above 95% with both B. thuringiensis and B. cereus type strain genomes (Supplementary Table 15).

Discussion
With the first and complete de novo reconstruction of the genome of NB125, the molecular changes that led to the creation of NB176-1 and NB176, the current production strain formulated in the plant protection product Novodor ® FC, could be traced. To decipher their complete genomes, a hybrid approach of short (Illumina) and long read (Nanopore) sequencing (Wick et al., 2017b) was applied, which allowed the detection of all genomic features underlining the strength of this technique. Long sequencing reads spanning repetitive regions helped to resolve these critical genomic features although manual intervention had been required to finalize the assembly (Wick et al., 2017a). The reliability of the genomic assemblies of NB125, NB176-1 and NB176 were supported by the recent complete genome assembly of Bacillus thuringiensis serovar morrisoni (strain 4AA1) (Genbank assembly GCA_022810725.1). Its chromosomal length, number and length of plasmids was similar to NB125, NB176-1 and NB176, although the applied sequencing technique was different (Illumina sequencing, PacBio (single molecule sequencing) and Sanger sequencing techniques). In general, resolving repetitive sequences and multi-copy genes compromising the correct assembly of high throughput sequencing data is key to a complete bacterial genome reconstruction. The challenge of unresolved plasmid sequences might be the reason why in a previous sequencing approach of NB176 (Biggel et al., 2021) the identification of the plasmid number and length was not successful.
In the literature, NB176-1 and NB176 were described as highyielding derivatives of NB125 (Adams et al., 1994;Gurtler and Petersen, 1994;European Food Safety Authority, 2013). However, only little information was known on the differences between the parental strain NB125 and its radiation-derived strains. Although a duplication of the cry3Aa gene in NB176-1 was already predicted by Adams et al. (1994) by Southern hybridization and partial gene sequencing, a conclusive proof is finally provided by this study, identifying cry3Aa on plasmids 99-ppl and 185-ppl of NB176-1. Furthermore, the genomic analysis revealed that 99-ppl was a recombination product of the original 137-ppl plasmid and a region of 185-ppl carrying the cry3Aa gene and therefore resulting in its duplication and the Cry3Aa overproducing phenotype of NB176-1. By this finding, the origin of the cry3Aa gene could be unequivocally defined on the 185-ppl (~120 kDa) of NB125, which is close to the plasmid size of 90 kDa that had been proposed as location of cry3Aa (Sekar et al., 1987).
Besides the noted re-arrangement, we further propose that the shortening of 137-ppl, which led to the creation of 99-ppl, was mediated by Tn3 family transposases (TnBth1, TnBth2) and an IS4 family transposase IS231C. It was described previously that insertion sequences, such as IS231 and IS232 and transposons are frequently found in the vicinity of cry genes enabling transposition and gene duplication (Menou et al., 1990;Mahillon et al., 1994;Murawska et al., 2014;Fiedoruk et al., 2017). In addition to the cry3Aa gene, two other insecticidal genes, encoding for Mpp23Aa1 (old name: Cry23Aa) and Xpp37Aa1 (old name: Cry37Aa) were located on the original Cry3Aa encoding plasmid 185-ppl. These findings are in line with Caballero et al. (2020) who suggested that the commercial strain NB176 produces the three insecticidal proteins Cry3Aa, Mpp23Aa1 (Cry23Aa), and Xpp37Aa1 (Cry37Aa) (Caballero et al., 2020). These conclusions are difficult to draw from the previously published incomplete assembly of NB176 (Biggel et al., 2021) since collapsing repeat regions did not allow assumptions about changes and recombination in plasmid sequences. Genome assemblies are further challenged by the presence of repetitive mobile regions that could flank d-endotoxin and antimicrobial resistance genes (Kronstad et al., 1983;Fiedoruk et al., 2017;Orlek et al., 2017) hiding their genomic position (He et al., 2015;Sheppard et al., 2016;Arredondo-Alonso et al., 2017). The exact location of antimicrobial resistance genes is crucial, because they might be spread when present on mobile genetic elements such as plasmids (Tschäpe, 1994;Thomas and Nielsen, 2005;Berbers et al., 2020).
Although NB176-1 and NB176 are almost identical in their plasmid sequences, a single chromosomal deletion of~175.8 kbp was identified in NB176, when compared to NB125 and NB176-1. The deletion comprised a set of 182 CDS, of which 170 CDS had no additional copies on the chromosomal or any other plasmid sequences. Although sixty CDS located on the deletion site were used for functional analyses, it remains unclear how this deletion influence phenotypic characteristics. Some of the deleted genes might enable the parental strain NB125 to survive in its natural environment but may be unnecessary under production conditions. Since NB176-1 was deposited at the DSMZ in 1989 and NB176 is representing the current strain under commercial production, it can be hypothesized that the genome of NB176 in its current state is highly stable and did not undergo any further changes. Since the genomes of NB176-1 and NB176 are almost identical, the 175.8 kbp deletion could be regarded as a singular event in the history of NB176.
The screening for antimicrobial resistance genes and "Bacillus cereus s.l." (Bc) virulence genes did not reveal differences between the parental and derivative strains. Regarding enterotoxin genes, all three Btt strains harbored genes encoding for the non-hemolytic enterotoxin (nhe) and hemolysin BL (hbl), but were negative for cytotoxin K (cyt K), consistent with previous studies concerning Btt (Kyei-Poku et al., 2007;Schwenk et al., 2020;Biggel et al., 2021;Bonis et al., 2021). Moreover, this study confirmed that the nhe promoter was disrupted by a transposase insertion in all three strains and thus that Nhe expression might be prevented in the commercial NB176 (Johler et al., 2018;Biggel et al., 2021). Genes encoding for one-component putative enterotoxins (entFM, bceT, entA) pore forming haemolysins (clo, hlyII), metalloproteases (inhA1, inhA2), phospholipases (plcA, cerA, cerB), and regulatory proteins (hlyIIR, plcR) were detected. It can be assumed that the three detected exo-polysaccharide genes (bpsE/ F/H) are false positive hits, since only three out of nine bpsX-H genes were detected and located on different chromosomal loci. Previous studies have demonstrated that some members of the B. cereus group possess genes sharing a high degree of homology with Bps-encoding genes (Carroll et al., 2019;Carroll et al., 2020). Therefore, it was concluded that 19 chromosomally and one plasmid-encoded (putative phosphatidylinositol-specific phospholipase C located on plasmid 250ppl) virulence genes were found.
Thirteen chromosomally encoded AMR genes were predicted. Four beta-lactamase genes were detected, in agreement with previous studies that found beta-lactamase genes as well as phenotypic resistance to be very common in the B. cereus group, including B. thuringiensis (Luna et al., 2007;Park et al., 2009;Kim et al., 2015;Yim et al., 2015;Fiedler et al., 2019;Bianco et al., 2021;Mills et al., 2022). Even though a single vancomycin resistance gene was found, a vancomycin resistance seems unlikely, as individual van genes were previously reported in the B. cereus group but were insufficient to confer phenotypic resistance (Arthur et al., 1992;Patel et al., 1997;Bianco et al., 2021;Mills et al., 2022). Genes associated with resistance to macrolide (mhpL, abc-f), lincosamide-streptogramin (lsa), streptogramin (vat), streptothricin (sat A), phenicol (catA) and fosfomycin (fosB) drug families were also detected in the three analyzed strains. In addition to the chromosomally encoded AMR genes, three nucleotidyltranferase genes associated with aminoglycoside resistance were predicted for plasmid 250-ppl. Some isolates of the B. cereus group are known for aminoglycoside resistance (Parulekar and Sonawane, 2018;Fiedler et al., 2019) whereas others carried nucleotidyltransferase genes but were not phenotypically resistant (Fiedler et al., 2019). Since only few studies associate the presence of AMR (Mills et al., 2022) or virulence genes to phenotypic traits, no direct conclusion can be drawn from in silico studies about the actual functionality of any of these genes or the conditions under which these genes might be expressed.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, PRJNA886432.