Genomic analysis of six new Geobacillus strains reveals highly conserved carbohydrate degradation architectures and strategies

In this work we report the whole genome sequences of six new Geobacillus xylanolytic strains along with the genomic analysis of their capability to degrade carbohydrates. The six sequenced Geobacillus strains described here have a range of GC contents from 43.9% to 52.5% and clade with named Geobacillus species throughout the entire genus. We have identified a ~200 kb unique super-cluster in all six strains, containing five to eight distinct carbohydrate degradation clusters in a single genomic region, a feature not seen in other genera. The Geobacillus strains rely on a small number of secreted enzymes located within distinct clusters for carbohydrate utilization, in contrast to most biomass-degrading organisms which contain numerous secreted enzymes located randomly throughout the genomes. All six strains are able to utilize fructose, arabinose, xylose, mannitol, gluconate, xylan, and α-1,6-glucosides. The gene clusters for utilization of these seven substrates have identical organization and the individual proteins have a high percent identity to their homologs. The strains show significant differences in their ability to utilize inositol, sucrose, lactose, α-mannosides, α-1,4-glucosides and arabinan.

A 23.55 kb genomic DNA fragment from Geobacillus stearothermophilus strain T-6 contains the genes for extracellular and intracellular xylanases, β-xylosidase, and 12 genes involved in transport and metabolism of glucuronic acid (Shulami et al., 1999). The organization of the arabinan utilization genes from this organism, which form a separate cluster contiguous to the xylan utilization cluster, was described later (Shulami et al., 2011). A complete genome sequence for G. stearothermophilus strain T-6 has not been published, resulting in only limited understanding of the organization of xylan and arabinan metabolism within the G. stearothermophilus genome. Without a complete genome, it is also unclear if the genes present in these two clusters represent the complete set of genes needed for pentosan degradation. Without complete genome sequences, it is impossible to determine the genomic context of the individual enzymes described above, and if these individual enzymes are present at the genus, species, or strain level.
Whole genome sequencing is a potent tool for understanding the collection of genes a microorganism utilizes for carbohydrate degradation (Suen et al., 2011;Mead et al., 2012Mead et al., , 2013Christopherson et al., 2013). To date, only a limited number of complete Geobacillus genomes have been published including G. thermodenitrificans (Feng et al., 2007;Yao et al., 2013), G. kaustophilus (Takami et al., 2004), Geobacillus sp. strain GHH01 (Wiegand et al., 2013), Geobacillus sp. strain JF8 (Shintani et al., 2014), G. thermoglucosidans TNO-09.020 , and G. thermoleovorans CCB_US3_UF5 (Muhd Sakaff et al., 2012), and no detailed analysis of the carbohydrate degradation systems of these organisms have been published. Our group has isolated six novel xylanolytic Geobacillus strains as part of an effort to identify new, high specific activity thermophilic enzymes. The genomes of all six strains have been determined, with five of the six genome sequences deposited in GenBank, and the sixth available via the JGI genome portal. Using these genome resources, the carbohydrate degradation clusters in these six strains were identified and compared. The results of this analysis revealed that both the organization and the individual genes of carbohydrate metabolism are highly conserved throughout the genus. In addition, many of these carbohydrate degradation clusters reside in a single, 200-kb conserved genome region.
Geobacillus strains were isolated from environmental samples ( Table 1) on YTP-2 agar (contains (per liter) 2.0 g yeast extract, 2.0 g tryptone, 2.0 g sodium pyruvate, 1.0 g KCl, 2.0 g KNO3, 2.0 g Na 2 HPO 4 .7H 2 O, 0.1 g MgSO 4 , 0.03 g CaCl 2 , 8.0 g agar, and 2.0 ml clarified tomato juice) at 70 • C as described previously (Mead et al., 2012). For preparation of genomic DNA, 1 liter cultures of Geobacillus isolates were grown from a single colony in YTP-2 medium at 70 • C in flasks agitated at 200 rpm for 18 h and collected by centrifugation. The cell concentrate was lysed using a combination of SDS and proteinase K, and genomic DNA was isolated using a phenol/chloroform extraction (Sambrook et al., 1989). The genomic DNA was precipitated, and treated with RNase to remove residual contaminating RNA.
Cultures for enzyme assays were grown in 1.0 ml of YT2 medium (contains (per liter) 2.0 g yeast extract, 2.0 g tryptone, 2.0 g carbohydrate substrate, 1.0 g KCl, 2.0 g KNO 3 , 2.0 g Na 2 HPO 4 .7H 2 O, 0.1 g MgSO 4 , 0.03 g CaCl 2 , 8.0 g agar, and 2.0 ml clarified tomato juice). Cultures were grown from single colonies at 70 • C in 2.0 ml screw-cap vials for 72 h at 1000 rpm in a Thermomixer R (Eppendorf, Hamburg, Germany). Cells were recovered by centrifugation, and the cell pellets were lysed by treatment with 0.1 ml of CelLytic IIB reagent. Qualitative endoactivities of supernatant and lysate samples were determined in 0.50 ml of 50 mM acetate buffer, pH 5.8, containing 0.2% AZCL insoluble substrates and 50 µl of supernatant or 10 µl of clarified lysate. Assays were performed overnight at 70 • C, with shaking at 1000 rpm in a Thermomixer R. Tubes were clarified by centrifugation, and absorbance values at 600 nm were determined using a Bio-Tek ELx800 plate reader. The exoactivities of supernatant and lysate samples were determined by spotting 5.0 µl of clarified lysate directly on agar plates containing 10 mM 4-methylumbelliferyl substrate. Plates were incubated in a 70 • C incubator for 2 h; after incubation, the plates were examined using a hand-held UV lamp and compared with negative and positive controls. Duplicate cultures were used for all assay experiments. The genomes of six Geobacillus isolates were sequenced at the Joint Genome Institute (JGI) using Sanger sequencing with a combination of 6 kb and 34 kb DNA libraries and 454 FLX pyrosequencing done to a depth of 20× coverage; Solexa sequencing data was used to polish the assemblies. All general aspects of library construction and sequencing performed at the JGI can be found at their website. The Phred/Phrap/Consed software package (Lee and Vega, 2004;Machado et al., 2011) was used to assemble 6-kb and fosmid libraries. Genes were identified using Prodigal (Hyatt et al., 2010) as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline (Pati et al., 2010). The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant protein database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE (Lowe and Eddy, 1997), RNAMMer (Lagesen et al., 2007), Rfam (Griffiths-Jones et al., 2003), TMHMM (Chen et al., 2003), and signalP (Krogh et al., 2001). The Geobacillus cultures are available from the Bacillus Genetics Stock Center (BGSC) at Ohio State University; all genome sequences can be accessed online ( Table 1).
The phylogeny of the novel Geobacillus strains was determined using the 16S rRNA gene sequences of the six sequenced strains, as well as those of the type strains of all validly described Geobacillus spp. The 16S rRNA gene sequences were aligned using MUSCLE (Edgar, 2004), pairwise distances were estimated using the Maximum Composite Likelihood (MCL) approach, and initial trees for heuristic search were obtained automatically by applying the Neighbour-Joining method in MEGA 5 (Tamura et al., 2011). The alignment and heuristic trees were then used to infer the phylogeny using the Maximum Likelihood method based on the Tamura-Nei (Tamura and Nei, 1993).
Carbohydrate utilization enzymes were identified from UniProt (Apweiler et al., 2004;Consortium, 2013Consortium, , 2014, and BLASTp analysis (Cameron et al., 2004) was used to identify orthologs in the genomes. Neighborhood analysis was performed using IMG tools (Markowitz et al., 2012) to determine clusters and manually curate the electronic annotations.

Results
As part of a project to identify new thermophilic enzymes that degrade biomass, microbial cultures from hot springs and composts were isolated and biochemically screened to identify novel, aerobic, biomass-degrading thermophiles. Aerobic enrichments were performed at 70 • C, and the vast majority of the 100 isolates were Geobacillus or Thermus species. Six of these Geobacillus isolates were selected for additional characterization based on the ability of colonies to hydrolyze MUX or MUC incorporated into agar plates. Five of these isolates were from hot springs in the United States (Yellowstone National Park and Nevada) and one was from a grass compost sample collected in Middleton WI (Table 1).
To determine if these isolates produced xylan degrading enzymes, the six selected cultures (designated C56-YS93 (YS93), G11MC16 (1MC16), Y412MC52 (MC52), Y412MC61 (MC61), C56-56T2 (56T2), and C56-T3 (56T3) were grown in 1.0 ml cultures of YT2 media containing one of six carbohydrate substrates (pyruvate, glucose, xylose, arabinose, xylo-oligosaccharides and arabinogalactan) and assayed qualitatively for the production and activity of extracellular xylanase and intracellular β-xylosidase as described in Materials and Methods. All six strains produced extracellular xylanase when grown on either xylose or pyruvate ( Table 2). In addition, extracellular xylanase was produced by at least three of the cultures when grown on arabinose, arabinogalactan or xylooligosaccharides. None of the six strains produced extracellular xylanase when grown on glucose, in agreement with reports of catabolite repression of G. stearothermophilus extracellular xylanase production (Cho and Choi, 1999). Intracellular βxylosidase was produced by all six strains when grown on xylose while none of the strains produced intracellular β-xylosidase when grown on pyruvate or glucose. Only one strain (YS93) produced intracellular β-xylosidase when grown on arabinose, arabinogalactan and xylo-oligosaccharides. The results of the extracellular and intracellular assays confirmed that all six strains possess the ability to degrade xylan. Based on the positive results obtained in the enzyme screening experiments, the six strains were submitted for sequencing by the JGI of the Department of Energy. Genome sequencing yielded five closed genomes with one isolate, 1MC16, left as a permanent draft genome containing 31 contigs ( Table 3). The genomes are all of similar size, ranging from 3.5 to 4.0 megabases. Plasmid content varies from none in 56T3, one in MC52 and MC61, and two in strains YS93 and 56T2. The presence of plasmids in 1MC16 could not be confirmed from the assembled contigs. The genomes display significantly different G+C contents. YS93 has a mean genomic G+ C content of 43.9%, 1MC16 has an intermediate value of 48.8% G+C, and MC52, MC61, 56T2, and 56T3 have significantly higher values of 52.3-52.5% G+C ( Table 3).
A phylogenetic tree of 16S rRNA gene sequences was constructed using the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei, 1993) to determine the phylogenetic positions of the novel strains. The resulting tree (Figure 1) shows YS93 clades with G. thermoglucosidasius, 1MC16 clade with G. thermodenitrificans, MC52, MC61, and 56T3 clade with G. stearothermophilus and G. thermocatenulatus, and 56T2 may represent a novel species of Geobacillus. To confirm the assignments obtained with 16S rRNA gene sequences, pairwise average nucleotide identity values were calculated for the six strains against all draft, permanent draft, and finished Geobacillus genomes in the IMG database. Average nucleotide identity values (ANI) (Kim et al., 2014) were calculated using software developed for the IMG (Markowitz et al., 2006(Markowitz et al., , 2014. The results (Table 4) confirm the classification of the strains obtained using 16S rRNA gene sequences. YS93 clades with other G. thermoglucosidasius strains (pink), 1MC16 clades with G. thermodenitrificans strains (blue), 12MC52, 12MC61, and 56T3 clade together in what appears to be a new species (yellow), and 56T2 appears to clade only with itself (gray).

Identification of Metabolic Clusters
The six genomes were searched for the location of orthologs of the xylan cluster described in G. stearothermophilus T-6. Surprisingly, in all six strains the xylan utilization cluster is located in a similar, highly conserved region of the Geobacillus genomes (Figure 2). In all six strains, this genome region contains clusters for the utilization of xylan as well as fructose, cellobiose, gluconate, and mannitol utilization clusters. In five of the six strains, clusters for the utilization of arabinan, arabinose, and ribose are also present in this region. Inositol and α-mannoside utilization clusters are present this region in one strain. In addition to carbohydrate utilization clusters, all six strains possess a 16-gene biosynthesis cobalamin cluster and a 4-gene nitrite reductase cluster. Five of the six strains contain a 13-gene urea utilization cluster and a 4-gene nitrate reductase cluster. This large super-cluster of metabolic clusters, conserved at the genus level, appears to be a unique feature of the Geobacillus. Carbohydrate utilization clusters found in this ∼200 kb region of the genomes will be described first, proceeding in the direction of transcription. Following these descriptions, carbohydrate utilization clusters not found in the ∼200 kb region will be described.

Carbohydrate Clusters Found in the ∼200 Kb Region of the Sequenced Geobacillus Strains Mannitol Metabolism
In all six strains, orthologous clusters code for three-component phosphotransferase system (PTS) that uses phosphoenolpyruvate to transport the sugar into the cell and phosphorylate it, generating intracellular mannitol-1-phosphate. A MtlR family FIGURE 1 | The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model (Tamura and Nei, 1993). The tree with the highest log likelihood (−3118.4467) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 24 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 1260 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 (Tamura et al., 2011). The type strains of all validly described species are included (NCBI accession numbers): G. caldoxylolyticus The 16S rRNA sequence of Paenibacillus lautus JCM9073 T (AB073188) was used to root the tree.
transcriptional regulator controls mannitol uptake in all six strains. The six mannitol utilization clusters also contain a gene coding for mannitol-1-phosphate 5-dehydrogenase, which converts the mannitol-1-phosphate to fructose-1-phosphate. Similar transport and metabolism clusters are used for fructose, cellobiose and sucrose metabolism.

Gluconate Metabolism
All six strains possess an orthologous cluster for gluconate utilization similar to the GntU, GntK, and GntR cluster found in E. coli (Tong et al., 1996). Unlike the B. subtilis gluconate utilization cluster (Reizer et al., 1991), the Geobacillus cluster does not include a GntZ gene coding for 6-phosphogluconate dehydrogenase. The GntZ gene is present in all six strains, located randomly throughout the genomes.

α-Mannosides and Inositol-phosphate Utilization
Only one of the six strains, 1MC16, possesses the ability to utilize either inositol-phosphates or α-mannosides. The two clusters are located upstream of the gluconate cluster, where the other five Geobacillus genomes contain a 13-gene urease/urea utilization cluster. The mannoside utilization cluster has a 3-component ABC transporter system and an intracellular α-mannosidase, all under the control of a GntR family transcriptional regulator. Orthologous mannosidase clusters are present in the genomes of G. thermodenitrificans DSM 465 and G. thermodenitrificans NG80-2 (Feng et al., 2007), and the individual genes of these two strains are 99-100% identical to their 1MC16 gene counterparts. The inositol-phosphate utilization cluster ( Table 5) has two separate parts. The first is a five-gene cluster containing a 3component ABC transporter system, inositol 2-dehydrogenase, and an oxidoreductase domain protein. Following this is an inositol metabolic gene cluster of iolG, iolD, iolE, iolB, iolC, and iolA, all under the control of a LacI family transcriptional regulator. Identical inositol-phosphate utilization FIGURE 2 | Diagram of major functional clusters found in the conserved regions; carbohydrate utilization clusters are shown in red, non-carbohydrate clusters in blue. Cob, cobalamin biosynthetic cluster, NO 3 , nitrate reductase cluster; Fruc, fructose utilization cluster; Cell, cellobiose utilization cluster; NO2 3 , nitrite reductase cluster; Xyn, xylose and xylan utilization cluster; Ara, arabinose and arabinan utilization cluster, and ribose transporter cluster: Pep, peptide utilization cluster; Urea, urease and urea utilization cluster, Inos, inositol-phosphate utilization cluster; αMan, α-mannoside utilization cluster; GLcn gluconate utilization cluster; Mtl, mannitol utilization cluster. The gene sequence values for the corresponding genomes regions are (start-end): MC61, 2635441-2855821;MC52, 1775380-1995757;56T2, 1737107-1912324;56T3, 1646809-1858633;YS93, 2080803-2255158; 1MC16, contig ABVH01000004 28446-229812. clusters are present in G. thermodenitrificans DSM 465 and G. thermodenitrificans NG80-2. Other Geobacillus species possess similar clusters, but are organized with the three-genes of the ABC transporter following the genes for iolG (Yoshida et al., 2012) and with an additional protein, IolI, 2-keto-myo-inositol isomerase (Figure 3).

Arabinose and Arabinan Metabolism
Unlike xylan utilization, arabinose and arabinan utilization capability is strongly strain dependent ( Table 6). None of the strains possesses the complete arabinan cluster present in G. stearothermophilus T-6 (Shulami et al., 2011). Strain YS93 possesses none of the enzymes required for uptake and metabolism of either arabinose or arabinan. Strain 56T2 possesses the genes for metabolism of arabinose (genes 7-10 and 22-27) but none of the dedicated transporter systems. This suggests that the organism can utilize arabinose in arabinoxylan oligosaccharides that was transported into the cell by xylan transporters, but not extracellular arabinose or arabinan. The arabinose utilization cluster of strains MC52 and MC61 are most similar to the reported T-6 arabinan cluster, lacking only one of the two transporter clusters found in T-6 (genes 1-6). This suggests that these two organisms can utilize the full range of arabinose, small arabinan oligosaccharides, and linear arabinan. 1MC16 possesses two ABC transporter systems (genes 4-6 and 19-21). The first transporter system is orthologous to the G. stearothermophilus T-6 araT, AraE, araG transporter, while the second has no orthologs in T-6. While the three component araT, AraE, araG transporter system is annotated as an arabinose transport system, the three genes show remarkable homology to the rbsA, rbsB, rbsC cluster responsible for transport of ribose in B. subtilis (Woodson and Devine, 1994;Strauch, 1995), and may actually function as a ribose transport system within the arabinan-arabinose cluster. Strain 1MC16 lacks the eight-gene cluster containing the extracellular arabinanase, transporter and intracellular endo-αarabinanase, α-L-arabinofuranosidase, and arabinopyranosidase (genes 11-18) present in the T-6 cluster. 1MC16 possesses all metabolic enzymes needed for arabinose and potentially arabinooligosaccharide metabolism (genes 7-10 and 23-27), suggesting that the organism can utilize arabinose in arabinoxylan oligosaccharides, extracellular arabinose and possibly small arabinan oligosaccharides.
The most complex arabinose metabolic system is present in 56T3. 56T3 possesses an arabinose cluster that is orthologous to the 1MC16 cluster described above. However, in addition to this cluster, 56T3 possesses a seven-gene arabinan-utilization cluster consisting of a transcription regulator, three-component ABC transporter system, and three intracellular proteins, an arabinase, an arabinofuranosidase, and an annotated oxidoreductase with unknown function (Table 7). This cluster is located adjacent to the galactose utilization cluster in 56T3 and it is not orthologous to the clusters found in T-6, MC52, and MC61, but is closely related to the cluster found in the unpublished genome of Geobacillus sp. MAS1 (NCBI/RefSeq: AYSF01000001 through AYSF01000006) as well as distantly related to clusters in Bacillus spp. and Anoxybacillus tepidamans PS2.

Xylose and Xylan Metabolism
As expected from the fermentation results, all six strains possess gene clusters for xylan degradation and metabolism. Xylose and xylan are transported and metabolized by all six strains via a large single cluster containing as many as 32 genes (De Maayer et al., 2014) (Table 8). A single secreted xylanase (XynA) degrades xylan into oligosaccharides. Two, three-gene ABC transporters of xylose and xylooligosaccharides are present in all six strains (shown in bold, genes 3, 4, 5 and 10, 11, 12). In addition, strains 56T2 and C56-T3 contain a third three-gene ABC transporter (27,28,29). The transported oligosaccharides are further degraded into monosaccharides within the cell by an intracellular xylanase (XynA2), xylosidases (XynB and XynB2) and an α-glucuronidase (AguA) similar to those described in G.

Cellobiose and Fructose Metabolism
Cellobiose and fructose are utilized by all six strains via dedicated phosphotransferase system (PTS) transporter systems. In all six strains, orthologous clusters code for three-component phosphotransferase system (PTS) transporter systems that uses phosphoenolpyruvate to transport the sugar into the cell and phosphorylate it, generating intracellular fructose-1-phosphate or cellobiose-6-phosphate. A MerR family transcriptional regulator controls cellobiose uptake in all six strains. The six cellobiose utilization clusters also contain a gene coding for 6-phospho-β-glucosidase, which converts cellobiose-6phosphate to glucose and glucose-6-phosphate. A DeoR family transcriptional regulator controls fructose uptake in all six strains. The six fructose utilization clusters also contain a gene coding for 1-phosphofructokinase, which converts fructose-1-phosphate to fructose-1,6-diphosphate.
Carbohydrate Clusters Found Outside the ∼200 Kb Region

Starch Metabolism
Two separate gene clusters are dedicated to degradation of starch, one targeting α-1,4-linked glucooligosaccharides, and one targeting α-1,6-linked glucooligosaccharides. Genomic analysis indicates that five of the six strains (YS93 being the exception) possess the ability to degrade α-1,4-linked starch and starchderived α-1,4-linked glucans. In the five strains, an orthologous cluster codes for a secreted α-amylase, a three-component ABC transporter system, and an intracellular α-amylase, all under the control of a LacI family transcriptional regulator ( Table 9). The secreted α-amylase, transcriptional regulator and the threecomponent ABC transporter system show >90% identity among the five strains. The intracellular α-amylase genes of strains MC52, 12MC61, C56T3 and 56T2 code for 588 a.a. proteins with >90% identity to each other, but in 1MC16, the gene is truncated, coding for a 297 a.a. protein corresponding to the N-terminal domain of the 588 a.a. protein. In addition to the six-gene cluster, strains MC52, MC61, C56T3, and 56T2 possess an identical, two-gene insert containing a different secreted α-amylase (amyS) and a secreted amylopullulanase, located far downstream from the starch cluster. The utilization of three distinct secreted enzymes for degradation of starch is a highly unusual strategy for these Geobacillus species. In contrast, these Geobacillus species degrade xylan and arabinan using one secreted enzyme each, and no other secreted polysaccharide-degrading metabolic enzymes are secreted. None of the six strains contain the Geobacillus high molecular weight amylase that associates with the Slayer (Ferner-Ortner-Bleckmann et al., 2009), or the Geobacillus maltose-producing high molecular weight amylase (Diderichsen and Christiansen, 1988). In all six strains, an orthologous cluster codes for a three-component ABC transporter system, and an intracellular α-1,6-glucosidase, all under the control of a LacI family transcriptional regulator. The transcriptional regulator and the three-component ABC transporter system show >90% identity among the six strains, while the α-1,6-glucosidase shows a lower identity (70%). The cluster may act synergistically with the starch cluster to take up and degrade the branched regions of partially degraded amylopectin, or the cluster may take up and degrade more highly branched substrates such as pullulan or glycogen fragments.

Galactose and Galactoside Utilization
The six strains each show distinct metabolic capabilities for galactose utilization (Table 10). All six strains utilize galactose via the Leloir pathway of GalK, GalT, and GalE (Holden et al., 2003), similar to the pathway in most organisms including B. subtilis (Chai et al., 2012). The pathway in all six strains is under the control of a LacI family transcriptional regulator. C56T3 possesses only the Leloir pathway and no transporter or galactosidase genes, suggesting a limited ability to utilize exogenous galactose or galactans. 1MC16 lacks transporter genes, but possesses a single β-galactosidase, suggesting 1MC16 is able to utilize galactose linked to xylan or arabinan that was transported into the cell via xylan or arabinan transporter systems. Similarly, strain YS93 lacks transporter genes, but possesses a single intracellular α-galactosidase, suggesting 1MC16 is able to utilize galactose linked to sucrose, xylan or arabinan that was transported into the cell via the corresponding transporter system. 56T2 possesses transporter genes and genes for two intracellular β-galactosidases, suggesting the ability to utilize lactose and galactan oligosaccharides. Finally,  strains MC52 and MC61 possess transporter genes and genes for two intracellular β-galactosidases and one intracellular αgalactosidase, suggesting the ability to utilize a wide range of galactose-containing oligosaccharides. None of the strains possess the extracellular α-galactosidase identified in one strain of G. stearothermophilus (Talbot and Sygusch, 1990). The intracellular α-galactosidases show significant differences in sequence. The intracellular α-galactosidases of MC52 and MC61 share 100% identity with each other and 98% identity with the G. stearothermophilus α-galactosidase identified as AgaA (Merceron et al., 2012). The intracellular α-galactosidase of YS93 shares only 81-82% identity with G. stearothermophilus AgaA and theαgalactosidases of MC52 and MC61, but shares 93% identity with the G. stearothermophilus α-galactosidase identified as AgaN (Fridjonsson et al., 1999). The MC52 and MC61 β-galactosidase, GH42 share 99% identity with the G. stearothermophilus β-galactosidase GanB (Solomon et al., 2013), while the 56T2 shares 96% identity with the G. stearothermophilus enzyme. The second β-galactosidase, β-galactosidase GH2 of MC52 and MC61 share 100% identity with each other and 96% identity with the 56T2 enzyme. The gene for this β-galactosidase appears to be uncommon among thermophiles, being identified only in the genome of Geobacillus sp. Strain WSUCF1 (Bhalla et al., 2013) (99% identity to MC52 and MC61) and Anoxybacillus flavithermus TNO-09.006 (Caspers et al., 2013) (98% identity to MC52 and MC61). This GH2 β-galactosidase is related to similar enzymes in mesophilic species such as B. halodurans strain ATCC BAA-125 (Takami et al., 2000) (69% identity to MC52 and MC61) and Paenibacillus polymyxa strain CR1 (Eastman et al., 2014) (67% identity to MC52 and MC61).

Sucrose Metabolism
Sucrose is utilized by three of the six strains (MC52, MC61, and YS93) via a dedicated phosphotransferase system (PTS) transporter system. In all three strains, orthologous clusters code for three-component phosphotransferase system (PTS) transporter systems that uses phosphoenolpyruvate to transport the sugar into the cell and phosphorylate it, generating intracellular sucrose-6-phosphate under control of a MtlR family transcriptional regulator. The three sucrose utilization clusters also contain a gene coding for sucrose-6-phosphate hydrolase, which converts sucrose-6-phosphate to fructose and glucose-6phosphate. The remaining three strains have no sucrose uptake system of any kind.

Discussion
In this work we report the whole genome sequences of six new xylanolytic Geobacillus strains along with the genomic analysis of their capability to degrade carbohydrates. The six sequenced Geobacillus strains described here have a range of GC contents from 43.9 to 52.5%. Based on phylogenetic analysis, three of the strains, MC52, MC61, and 56T3 may be members of a single new species, and 56T2 may also be a member of a new species. The remaining two strains clade with named Geobacillus species (Zeigler, 2005).
Whole genome sequencing and analysis of these six strains gives a first look at the wide range of carbohydrate degradation capabilities (Table 11) of Geobacillus species. All six strains are predicted to utilize fructose, arabinose, xylose, mannitol, gluconate, xylan, and pullulan (α-1,6-glucosides). The gene clusters have identical organization and the individual proteins have a high percent identity to their homologs. Significant differences exist in the ability of the sequenced strains to utilize inositol, sucrose, lactose, α-mannosides, α-1,4-glucosides and arabinan. None of the strains was able to utilize all of these carbohydrates. Complete or partial utilization pathways were present or were completely absent in a strain-specific pattern. The proteins utilized in degradation of these carbohydrates showed greater strain-to-strain variation than the proteins utilized in degradation of fructose, arabinose, xylose, mannitol, gluconate, xylan, and pullulan.
Our group has sequenced and analyzed the genomes of a number of biomass degraders including three Cellulomonas spp. (Christopherson et al., 2013), Bacillus cellulosilyticus , Fibrobacter succinogenes (Brumm et al., 2011b;Suen et al., 2011) and Dictyoglomus turgidum (Brumm et al., 2011a). Comparison of the genomes of these biomass degraders to the six Geobacillus spp., show three major differences between the strategies employed by the Geobacillus and these other diverse organisms.
The Geobacillus spp. in this work were selected for their ability to hydrolyze MUX or MUC. Based on enzymatic assays, all six strains were able to utilize xylan, but only two strains, MC52 and MC61, were able to utilize arabinan. The genes for these activities were found in a large, conserved pentosan degradation cluster. Five of the six pentosan clusters include a region involved in arabinan degradation and all six include a region for xylan degradation, with over 50 possible genes in the combined pentosan cluster. The organization of the genes within the cluster is highly conserved in all the Geobacillus strains studied, and more importantly, none of the genes involved in pentosan metabolism are found outside this cluster. In the six diverse biomass degraders, pentosan degradation genes are not clustered, but are distributed randomly throughout the genomes. Random distributions of pentosan degradation genes are seen in other biomass degraders such as Bacillus, Clostridium, and Streptomyces species. These observations suggest that the large, single pentosan degradation cluster appears to be a unique feature of Geobacillus spp. The evolutionary advantages of a single cluster versus a random distribution are unclear, but suggest a single cluster may be an adaptation to life under extreme conditions. The Geobacillus pentosan degradation cluster is part of a ∼200 kb unique super-cluster, containing five to eight distinct carbohydrate degradation clusters in a single genomic region, a feature not seen in other sequenced strains in related genera. The Geobacillus spp. are also unique in their dependence on a minimum number of secreted enzymes for utilization of carbohydrates. Only two secreted enzymes, a xylanase and an arabinanase, are used in degradation of xylan and arabinan. Starch degradation utilizes three secreted enzymes. None of the Geobacillus spp. secrete xylosidases or arabinofuranosidases. In contrast to the Geobacillus spp., most other Gram-positive pentosan-degraders secrete multiple xylanases as well as multiple xylosidases. For example, Cellulomonas flavigena secretes 19 xylanases and 3 xylosidases, Cellulomonas fimi secretes 6 xylanases and 4 xylosidases, and Cellulomonas gilvus secretes 6 xylanases and 5 xylosidases (Christopherson et al., 2013). In further contrast to the Geobacillus spp., many other Grampositive pentosan-degraders secrete combinations of other biomass-degrading enzymes such as cellulases, mannanases, xyloglucanases, pectinases, and pectate lyases. The genomes of the Geobacillus spp. lack orthologs of these secreted enzymes, indicating that Geobacillus spp. may target a limited range of carbohydrate polymers in intact biomass, or degrade biomass as part of a thermophilic consortium whose other members possess these activities.
Another unique feature of the Geobacillus pentosan cluster enzymes is the lack of targeting by attached carbohydrate binding modules (CBM) (Lombard et al., 2014). CBM modules are believed to improve enzyme efficiency by providing specific noncatalytic binding to the correct substrate (Boraston et al., 2004). CBM modules are present in many of the xylanases produced by thermophilic Gram-positive organisms including Clostridium thermocellum and Caldicellulosiruptor species (http://www.cazy. org/) (Lombard et al., 2014). The lack of CBM modules may indicate that the Geobacillus enzymes predate the evolution of CBM modules. Alternately, the lack of CBM modules make give Geobacillus enzymes the ability to utilize a broader range of substrates at the cost of a slower rate of hydrolysis.
The sequencing and genomic analysis of these six Geobacillus spp. confirms the belief that Geobacillus spp. are an excellent source of a variety of thermophilic enzymes with industrial applications. The variety of enzymes observed in a number of pathways, as well as the absence of previously identified Geobacillus enzymes such as the maltogenic (Diderichsen and Christiansen, 1988) and high molecular weight (Ferner-Ortner-Bleckmann et al., 2009) amylases suggest that sufficient genetic variability exists with the genus to supply additional new enzymes with novel applications.