A phylogenomic and molecular signature based approach for characterization of the phylum Spirochaetes and its major clades: proposal for a taxonomic revision of the phylum

The Spirochaetes species cause many important diseases including syphilis and Lyme disease. Except for their containing a distinctive endoflagella, no other molecular or biochemical characteristics are presently known that are specific for either all Spirochaetes or its different families. We report detailed comparative and phylogenomic analyses of protein sequences from Spirochaetes genomes to understand their evolutionary relationships and to identify molecular signatures for this group. These studies have identified 38 conserved signature indels (CSIs) that are specific for either all members of the phylum Spirochaetes or its different main clades. Of these CSIs, a 3 aa insert in the FlgC protein is uniquely shared by all sequenced Spirochaetes providing a molecular marker for this phylum. Seven, six, and five CSIs in different proteins are specific for members of the families Spirochaetaceae, Brachyspiraceae, and Leptospiraceae, respectively. Of the 19 other identified CSIs, 3 are uniquely shared by members of the genera Sphaerochaeta, Spirochaeta, and Treponema, whereas 16 others are specific for the genus Borrelia. A monophyletic grouping of the genera Sphaerochaeta, Spirochaeta, and Treponema distinct from the genus Borrelia is also strongly supported by phylogenetic trees based upon concatenated sequences of 22 conserved proteins. The molecular markers described here provide novel and more definitive means for identification and demarcation of different main groups of Spirochaetes. To accommodate the extensive genetic diversity of the Spirochaetes as revealed by different CSIs and phylogenetic analyses, it is proposed that the four families of this phylum should be elevated to the order level taxonomic ranks (viz. Spirochaetales, Brevinematales ord. nov., Brachyspiriales ord. nov., and Leptospiriales ord. nov.). It is further proposed that the genera Borrelia and Cristispira be transferred to a new family Borreliaceae fam. nov. within the order Spirochaetales.


INTRODUCTION
The phylum Spirochaetes consists of a large group of motile bacteria which are widespread in the environment and are highly prevalent disease causing agents (Seshadri et al., 2004;Paster, 2011a). The members of this phylum share a distinguishing morphological feature, the endoflagella, a special class of flagella that folds back into the cell and remains within the periplasm (Li et al., 2008). Most spirochetes have one or more of these structures protruding from either pole of the cell, forming an axial filament, which gives rise to the characteristic jerky, corkscrewlike motility of the members of the phylum (Li et al., 2008;Paster, 2011a).Currently, the phylum Spirochaetes consists of 15 genera which are highly divergent in terms of their lifestyle and other characteristics (Euzéby, 2013). They live in marine sediments, deep within soil, commensally in the gut of arthropods, including termites, as well as in vertebrates as obligate parasites. They can also be free-living or host-associated, pathogenic or non-pathogenic, and aerobic or anaerobic (Paster, 2011a). There is also enormous variability in the genome sizes and organization of Spirochaetes species Table 1. However, despite the diverse characteristics of its members, the phylum Spirochaetes is currently comprised of a single class, Spirochaetia, containing a single order, Spirochaetales, which is made up of four families (viz. Spirochaetaceae, Brachyspiraceae, Leptospiraceae, and Brevinemataceae) (Paster, 2011a;Euzéby, 2013).
There are four clinically important genera of the phylum Spirochaetes whose species are the causative agents of many globally prevalent illnesses, Treponema, Borrelia, Leptospira, and Brachyspira (Bellgard et al., 2009). Of these, Treponema and Borrelia are members of the family Spirochaetaceae, which also includes the genera Clevelandina, Cristispira, Diplocalyx, Hollandina, Pillotina, Spirochaeta, and Sphaerochaeta (Paster, 2011b;Euzéby, 2013). However, the genera Clevelandina, Diplocalyx, Hollandina, and Pillotina have yet to be isolated and grown in pure or mixed culture and their phylogeny is based largely on analyses of morphological characteristics (Bermudes et al., 1988). Treponema pallidum subspecies pallidum is the causative agent of syphilis, a sexually transmitted disease which affects at least 25 million adults worldwide (Gerbase et al., 1998). Other members of the genus Treponema are responsible for diseases such bejel, yaws, and pinta and play important role in periodontal diseases (Ellen and Galimanas, 2005;Visser and Ellen, 2011;Smajs et al., 2012). Members of the genus Borrelia, namely Borrelia burgdorferi s and Borrelia recurrentis, are important human pathogens that cause Lyme disease and relapsing fever, respectively (Dworkin et al., 2008;Nau et al., 2009;Cutler, 2010). Leptospira and Brachyspira, are members of the families Leptospiraceae and Brachyspiraceae, and causative agents of the diseases leptospirosis and intestinal spirochaetosis, respectively (Adler and de la Peña Moctezuma, 2010; Anthony et al., 2013;Euzéby, 2013). Despite the importance of species of the phylum Spirochaetes in causing many important human diseases, the evolutionary relationship of species within this phylum remains poorly understood and no distinguishing molecular features are known that are specific for all members of the different families (Olsen et al., 2000;Paster and Dewhirst, 2000;Paster, 2011a). The availability of genome sequences provides a valuable resource to identify/discover novel molecular markers that are helpful in these regards and to gain insights into their evolutionary relationships. Genomes from 48 species covering the three main families of the phylum Spirochaetes are now available in the NCBI database (Table 1) (NCBI, 2013). The availability of genome sequences allows for the use of comparative genomic approaches to identify molecular markers that are specific for different bacterial taxa at various taxonomic levels. Using genomic sequences, one useful approach pioneered by our lab involves the discovery of Conserved Signature insertions/deletions (i.e., Indels) or CSIs present in protein sequences that are specific for different groups of organisms. Due to the specificity of these CSIs for particular groups/taxa of species, they provide valuable molecular markers of common evolutionary descent (i.e., synapomorphies) for identification and demarcation of different phylogenetic/taxonomic clades of organisms in molecular terms. Additionally, based upon the presence or absence of these CSIs in outgroup species, it is possible to infer whether the observed genetic change is an insert or a deletion and a rooted phylogenetic relationship among different groups can be derived (Baldauf and Palmer, 1993;Gupta, 1998;Griffiths and Gupta, 2004;Gao and Gupta, 2012a).
In this work, we report the results of comparative analyses on protein sequences for the phylum Spirochaetes to identify molecular markers (CSIs) that are specific for the species from the phylum and its subgroups, or those that provide information regarding interrelationships among them. These studies have led to identification of 38 CSIs providing novel molecular markers for the species from the phylum and clarifying their evolutionary relationships. Additionally, we have also constructed a phylogenetic tree for all genome sequenced members of the phylum Spirochaetes based upon concatenated sequences for 22 conserved proteins. The inferences from different identified CSIs are strongly supported by the branching pattern of species in the phylogenetic tree indicating that the identified CSIs provide reliable molecular markers for the indicated groups of Spirochaetes.

PHYLOGENETIC SEQUENCE ANALYSIS
Phylogenetic analysis was performed on a concatenated sequence alignment of 22 highly conserved proteins (viz. UvrD, GyrA, GyrB, RpoB, RpoC, EF-G, EF-Tu, RecA, ArgRS, IleRS, ThrRS, TrpRS, SecY, DnaK, and ribosomal proteins L2, L5, S2, S3, and S9) which have been widely used for phylogenetic analysis (Harris et al., 2003;Gao and Gupta, 2012a). Sequences for these proteins were obtained from the NCBI database for representative strains of all the sequenced Spirochaetes species ( Table 1) and Thermosynechococcus elongatus and Nostoc flagelliforme which were used to root the tree. Multiple sequence alignments for these proteins were created using Clustal_X 1.83 (Jeanmougin et al., 1998) and concatenated into a single alignment file. Poorly aligned regions from this alignment file were removed using Gblocks 0.91 b (Castresana, 2000). The resulting alignment, which contained 7411 aligned amino acids, was used for phylogenetic analysis. The maximum likelihood (ML) and neighbor joining (NJ) trees based on 100 bootstrap replicates of this alignment were constructed using MEGA 5.1 (Tamura et al., 2011) employing the Whelan and Goldman (Whelan and Goldman, 2001) and Jones-Taylor-Thornton (Jones et al., 1992) substitution models, respectively.
A 16S rRNA gene sequence tree was also created for 107 sequences that included representative species for all 11 cultured Spirochaetes genera. 16S rRNA gene sequences larger than 1200 bp were obtained for all type species classified under the phylum Spirochaetes in release 114 of the SILVA database (Quast et al., 2013). Information for these sequences is provided in Supplemental Table 1. A ML tree based on these sequences was created using 100 bootstrap replicates of the 16S rRNA sequence alignments in MEGA 5.1 (Tamura et al., 2011) employing the General Time-Reversible (Tavaré, 1986) substitution model.

IDENTIFICATION OF MOLECULAR MARKERS (CSIs)
To identify CSIs that are commonly shared by different groups of Spirochaetes, BLASTp searches (Altschul et al., 1997) were performed on each protein in the genome of Treponema pallidum subspecies pallidum strain Nichols. These searches were performed using the default BLAST parameters against all available sequences in the GenBank non-redundant database. For those proteins for whom high scoring homologs (E-values < 1e −20 ) were present in other species from the phylum Spirochaetes and some other bacterial groups multiple sequence alignments were created using the Clustal_X 1.83 program (Jeanmougin et al., 1998). These alignments were visually inspected for the presence of insertions or deletions that were flanked on both sides by at least 4-5 conserved amino acid residues in the neighboring 30-40 amino acids. Indels that were not flanked by conserved regions were not further considered, as they do not provide useful molecular markers (Gupta, 1998;Gao and Gupta, 2012a;Adeolu and Gupta, 2013). The specificity of potentially useful indels for members of the Spirochaetes was further evaluated by carrying out detailed Blastp searches on short sequence segments containing www.frontiersin.org July 2013 | Volume 4 | Article 217 | 3 the indel and the flanking conserved regions (60-100 amino acids long). To ensure that the identified signatures are only present in the Spirochaetes homologs, a minimum of 250 blast hits with the highest similarity to the query sequence were examined for the presence or absence of these CSIs. In this work, we report the results of only those CSIs that are specific for different groups of Spirochaetes and where similar CSIs were not observed in any other bacteria in the top 250 blast hits. The sequence alignment files presented here contain sequence information for all sequenced genera within Spirochaetes. However, due to size restraints, different strains and/or species of the sequenced genera are not shown as they all exhibited similar patterns.

GENOMIC CHARACTERISTICS OF THE SEQUENCED SPIROCHAETES
There are currently 48 genome sequenced species of Spirochaetes. Table 1 lists some characteristics of representative strains for all Spirochaetes species that have been completely sequenced. The genome sizes of these species of Spirochaetes showed a large amount of variation, ranging from 0.92 to 4.7 Mb in length. The G + C content of these species also showed a large amount of variation, ranging from 25.8 to 60.9%. The members of the phylum Spirochaetes also exhibited a large amount of variation in genome structure. The genome structure of members of genus Borrelia is one of the most unique among prokaryotes (Chaconas, 2005;Chaconas and Kobryn, 2010). The Borrelia genome consists of 6-24 DNA segments, including a linear chromosome about 900 kb in length which is accompanied by multiple essential linear and circular plasmids ranging from 5 to 220 kb in length (Chaconas and Kobryn, 2010). Linear chromosomes and plasmids terminated by covalently closed hairpin telomers are particularly uncommon genomic features among prokaryotes and are only found in the genomes of the Borrelia species and the species Agrobacterium tumefaciens (Goodner et al., 2001;Kobryn, 2007;Chaconas and Kobryn, 2010). Members of the genus Leptospira also have an unusual genome structure consisting of two circular chromosomes, a big chromosome about 3.6-4.2 Mb in length and a smaller chromosome about 300 kb in length (Ren et al., 2003;Picardeau et al., 2008).

PHYLOGENETIC ANALYSES OF THE SEQUENCED SPIROCHAETES
The branching order of species within the phylum Spirochaetes has primarily been determined using 16S rRNA sequence based phylogenetic trees Paster, 2011a). In these trees, the four families with the phylum branch into distinct monophyletic clades separated by long branches. However, the interrelationships of members of the family Spirochaetaceae are not reliably resolved (Paster, 2011b) (Figure 2). Phylogenetic trees derived from large numbers of conserved genes/proteins provide greater resolving power than those based on any single gene or protein (Rokas et al., 2003;Ciccarelli et al., 2006;Wu et al., 2009;Gao and Gupta, 2012a). In this study, we have constructed phylogenetic trees of the genome sequenced members of the phylum Spirochaetes listed in Table 1 using 22 conserved housekeeping and ribosomal proteins. The trees were constructed using both the NJ and ML methodologies and branching patterns generated by both methodologies were highly similar (Figure 1).
In the concatenated protein trees, which are rooted using the species T. elongatus and N. flagelliforme, the members of the three sequenced families of Spirochaetes (viz. Spirochaetaceae, Brachyspiraceae, and Leptospiraceae) formed three distinct monophyletic clades (Figure 1). Additionally, the branching order of members of the family Spirochaetaceae is well-resolved in the concatenated protein trees. Within the Spirochaetaceae clade, the genera Treponema, Spirochaeta, and Sphaerochaeta formed a well-supported monophyletic clade separated from the members of the genus Borrelia by a long branch. The Treponema, Spirochaeta, and Sphaerochaeta clade exhibited a large amount of diversity and consisted of a number of strongly supported subclades. Members of each of the sequenced genera within Spirochaetes formed monophyletic clusters with the exception of the genus Spirochaeta, where Spirochaeta smaragdinae branched with the genus Sphaerochaeta. Another Spirochaeta species, S. caldaria, which branched within the Treponema has recently been reclassified as Treponema caldaria (Abt et al., 2013). The remaining Spirochaeta (viz. S. thermophila and S. africana) branched deeply within the Treponema, Spirochaeta, and Sphaerochaeta clade (Figure 1). The monophyletic clade containing all the members of the genus Borrelia consisted of two highly distinct subclades, one containing Borrelia burgdorferi, and related species of Borrelia and the other containing Borrelia recurrentis related species.
The 16S rRNA tree shown in Figure 2 includes all of the members included in the concatenated protein tree as well as other cultured members of the phylum Spirochaetes which have yet to be genome sequenced. The branching patterns in the 16S rRNA phylogenetic tree were similar to those observed in the concatenated protein tree; all families within the phylum branched distinctly. Within the cluster consisting of members of the family Spirochaetaceae the genera Treponema, Sphaerochaeta, and most members of the genus Spirochaeta formed a monophyletic clade. The genera Borrelia and Cristispira also formed a well-supported monophyletic clade that was distinct from the genera Treponema, Spirochaeta, and Sphaerochaeta within the Spirochaetaceae clade. The different sequenced members of the genus Borrelia also formed two distinct clusters in the 16S rRNA tree (Figure 2).

CSI SPECIFIC FOR THE PHYLUM SPIROCHAETES
CSIs that are restricted to a group of related species are a novel class of molecular marker with high utility for evolutionary studies (Gupta, 1998;Rokas et al., 2003;Gupta, 2009;Gao and Gupta, 2012a). The co-occurrence of multiple CSIs in different species may be due to shared evolutionary history, convergent evolution, lateral gene transfer. However, the unique shared presence of multiple CSIs in a diverse range by a related group of species is most parsimoniously explained by the occurrence of the rare genetic changes that resulted in these CSIs in a common ancestor of the group, followed by vertical transmission of these CSIs to various descendant species (Gupta, 1998;Rokas and Holland, 2000;Gogarten et al., 2002;Gupta and Griffiths, 2002;Gao and Gupta, 2012a). Hence, these CSIs represent molecular synapomorphies of common evolutionary descent and they provide useful markers for identifying different groups of organisms in molecular terms and for understanding their interrelationships independently of phylogenetic trees (Gupta, 1998;Gupta and Griffiths, 2002;Gao and Gupta, 2012a,b). The CSI-based approach has recently been used to propose important taxonomic changes for a number of groups of bacteria (viz. Chloroflexi, Coriobacateriia, Neisseriales, and Bacillus) at different taxonomic ranks (Gupta et al., 2012Adeolu and Gupta, 2013;Bhandari et al., 2013). In the present work, we have completed comprehensive genomic analyses to identify CSIs that are primarily restricted to the phylum Spirochaetes or its subgroups. Information regarding the species specificities of these CSIs and their evolutionary significances are discussed below. Our analyses have identified 38 CSIs in diverse and important proteins that are specific for members of the Spirochaetes. One CSI has been identified that is specifically found in all of the sequenced members of the phylum Spirochaetes and not found in homologous proteins from any other bacterial species (in the top 250 Blast hits) (Figure 3). This CSI consists of a 3 amino acid (aa) insertion located in the flagellar basal-body rod protein FlgC, a component of the basal body which comprises a large portion of the flagella (Macnab, 2003). This CSI represents a unique molecular characteristic of the phylum Spirochaetes and may be related to the characteristic flagellar morphology shared by members of the phylum.

CSIs THAT ARE SPECIFIC FOR DIFFERENT FAMILIES OF SPIROCHAETES
Many of the CSIs identified by our analyses are specific for the different sequenced families within the phylum Spirochaetes (viz. Spirochaetaceae, Brachyspiraceae, and Leptospiraceae) allowing us to demarcate these families in clear molecular terms.
Seven of the CSIs identified by our analyses are specific for the family Spirochaetaceae. One example of a CSI that is specific for the species from the family Spirochaetaceae is a 15 aa insertion in a highly conserved region of the protein phosphoribosylpyrophosphate synthetase, which is uniquely found in all members of the family Spirochaetaceae but not in any other sequenced bacterial groups (Figure 4). Sequence information for 6 other CSIs in diverse proteins (viz. Alanyl-tRNA synthetase, phosphoribosylpyrophosphate synthetase, preprotein translocase SecY, peptide chain release factor 2, DNA mismatch repair protein MutS, and DNA mismatch repair protein MutL) that are also specifically present in members of the family Spirochaetaceae is presented in Supplementary Figures 1-6 and some of their characteristics are summarized in Table 2.
Our analyses have also identified 6 CSIs in diverse proteins that are specifically found in members of the family Brachyspiraceae and absent in all other bacterial groups. One of these Brachyspiraceae-specific CSIs, a 1 aa insertion, is present in the flagellar hook-associated protein FlgK, a protein involved in flagellar hook morphogenesis ( Figure 5A) (Homma et al., 1990). Another Brachyspiraceae-specific CSI, a 1 aa insertion, is found in a highly conserved region of DNA polymerase I (Figure 5B). These proteins represent highly conserved and essential components of members of the family Brachyspiraceae which contain conserved molecular changes not found in any other sequenced bacterial group. Sequence information for 4 other CSIs in three other proteins (viz. valyl-tRNA synthetase, ATP-dependent protease La, and glutamyl-tRNA amidotransferase subunit B) that are also specifically present in members of the family Brachyspiraceae is presented in Supplemental Figures 7-10 and some of their characteristics are summarized in Table 3.
We have also identified 5 CSIs that are uniquely present in members of the family Leptospiraceae. Two examples of such CSIs are shown in Figure 6. The first of these CSIs, an 8 aa insertion in the 50S ribosomal protein L14, is shown in Figure 6A, and the other CSI, a 4 aa insert in alanyl-tRNA synthetase, is shown in Figure 6B. Both of these CSIs are found in members of the the family Leptospiraceae and absent in every other sequenced bacterial group. Sequence information for 4 other CSIs in diverse proteins (viz. 30S Ribosomal protein S2, flagellar basal-body rod protein FlgG, and flagellar filament core protein FlaB) that are also specifically present in members of the family Leptospiraceae is presented in Supplemental Figures 11-14 and some of their characteristics are summarized in Table 4.

CSIs DISTINGUISHING TWO CLADES WITHIN THE FAMILY Spirochaetaceae
In addition to the numerous CSIs identified in our analyses for the sequenced families within the phylum Spirochaetes, we have also identified a number of CSIs that elucidate the relationship of the genera within the family Spirochaetaceae. Three of the identified CSIs are uniquely shared by the genera Treponema, Spirochaeta, and Sphaerochaeta. One example of a CSI specific to these three genera, a 1 aa deletion in the 30S ribosomal protein S13, a component of the protein translation complex, is shown in Figure 7A. Sequence information for 2 other CSIs specifically found in these three genera is provided in Table 5 and Supplemental Figures 14, 15. An additional 16 CSIs were uniquely shared by members of the genus Borrelia. One example of a CSI consisting of a 6 aa insertion in the glycolysis related protein, phosphofructokinase, that is specific to the members of the genus Borrelia is shown in Figure 7B. Fifteen other CSIs were also specifically found in members of the genus Borrelia and information for them is presented in Table 5 and Supplemental Figures 16-30.

DISCUSSION
The phylum Spirochaetes is currently distinguished from other bacteria on the basis of both branching in 16S rRNA sequence based phylogenies and the presence of the endoflagella that characterizes the phylum (Paster, 2011a;Euzéby, 2013). Apart from the presence of endoflagella, no reliable morphological, biochemical, or molecular characteristics are known that are specifically shared by all members of the phylum. Additionally, the phylum contains four divergent lineages, contained within a single class/order, that are demarcated largely on the basis of 16S rRNA sequence based phylogenies (Paster, 2011a). In this work, we have utilized comparative genomic techniques to identify large numbers of novel molecular signatures (CSIs) that are distinctive characteristics of either all members of the phylum Spirochaetes or for its different subgroups at multiple phylogenetic levels and which can be used to demarcate these groups in more definitive molecular terms. A summary diagram depicting the species distribution of the identified CSIs is shown in Figure 8.
The phylum Spirochaetes is rare in having a defining morphological characteristic, the endoflagella, which correlates to the clustering of the members of the phylum in 16S rRNA phylogenetic trees (Ludwig and Klenk, 2001;Cavalier-Smith, 2002;Paster, 2011a). The endoflagella is a unique feature of the phylum and is thought to responsible for the great pathogenic and ecological diversity of its many members (Ren et al., 2003). Of the 38 CSIs we have identified in this study, one was uniquely shared by all 48 members of the phylum Spirochaetes and absent in every other sequenced group of bacteria. The identified CSI is located in the flagellar basal-body rod protein FlgC, a core component of the motor complex of the flagella (Macnab, 2003). This CSI provides a novel means to distinguish the members of the phylum from all         in protein-protein interactions (Akiva et al., 2008;Singh and Gupta, 2009;Gupta, 2010). Thus, the CSI identified in FlgC likely plays an important role in the cellular functions of the flagellar basal-body. The phylum Spirochaetes contains 4 main lineages (viz. Spirochaetaceae, Brachyspiraceae, Leptospiraceae, and Brevinemataceae). These lineages have historically been distinguished from each other by their biochemical characteristics and their 16S rRNA gene sequences (Harwood and Canale-Parola, 1984;Paster et al., 1991;Paster, 2011a). In this study we have also identified 22 CSIs in a diverse range of proteins that are specific to each of the main sequenced lineages of the phylum Spirochaetes (viz. Spirochaetaceae, Brachyspiraceae, and Leptospiraceae), which serve to distinguish these lineages from themselves and all other bacteria. Seven of these identified CSIs were specific for the family Spirochaetaceae, 6 CSIs were identified that were specific for the family Brachyspiraceae, and 5 CSIs were identified that were specific to the family Leptospiraceae. Each of these lineages also branch distinctly and are separated by long branches in both 16S rRNA based and concatenated protein based phylogenetic trees (Figures 1, 2). This molecular and phylogenetic evidence supports the current division of these lineages. However, the large number of CSIs discovered for each of these groups and their genetic distances suggests that these lineages may represent higher taxonomic divisions (viz. orders or classes) than currently recognized. It is noteworthy that two of the CSIs that are specific for the Brachyspiraceae family and one that is specific for the Leptospiraceae are again found in flagella-related proteins (viz. FlgK, FlgB, FlgG) indicating that there might be interesting differences in the structures and/or functions of flagella within the Spirochaete families.
The family Spirochaetaceae, which contains the genera Borrelia, Clevelandina, Cristispira, Diplocalyx, Hollandina, Pillotina, Sphaerochaeta, Spirochaeta, and Treponema, is the most diverse of the lineages within the phylum Spirochaetes (Paster, 2011b;Euzéby, 2013). The interrelationships between the genera within this family are not reliably resolved by 16S rRNA sequence analysis (Paster, 2011b) (Figure 2). In this study we have identified 19 CSIs which serve to delineate at least certain relationships within the family Spirochaetaceae. Three of the CSIs identified are specifically found in members of the genera Sphaerochaeta, Spirochaeta, and Treponema and 16 additional CSIs were identified that are specifically found in members of the genus Borrelia. These CSIs suggest that the genera Sphaerochaeta, Spirochaeta, and Treponema shared a common ancestor distinct from the members of the genus Borrelia. In our concatenated protein phylogenetic tree, the genera Sphaerochaeta, Spirochaeta and Treponema formed a well-supported monophyletic clade, which was separated from the members of the genus Borrelia by a long branch, supporting the relationship delineated by these CSIs. Both of these two clades also exhibit considerable phylogenetic diversity. The clade consisting of genera Sphaerochaeta, Spirochaeta, and Treponema contains a number of distinct smaller subclades while the members of the genus Borrelia form two highly distinct clades in the phylogenetic trees. However, further work to identify molecular markers will be required to determine the significance of the branching of these subclades. The genus Cristispira has not had its genome sequenced, but it branches with the members of the genus Borrelia reliably in 16S rRNA based phylogenetic trees suggesting that some, if not all, of the Borrelia specific CSIs identified in this study may also be found in Cristispira (Paster, 2011b) (Figure 2). The remaining members of the family Spirochaetaceae (viz. Clevelandina, Diplocalyx, Hollandina, and Pillotina) have been identified in the hindguts of termite and cockroaches but have yet to be isolated and grown in pure or mixed culture. The current placement of the identified members of Clevelandina, Diplocalyx, Hollandina, and Pillotina in distinct genera within the family Spirochaetaceae is ambiguous and based largely on analyses of morphological characteristics (Bermudes et al., 1988). No genome or 16S rRNA sequences are currently available from these genera for phylogenetic analysis. However, the observations presented in this report suggest that Frontiers in Microbiology | Evolutionary and Genomic Microbiology July 2013 | Volume 4 | Article 217 | 14 the family Spirochaetaceae contains at least two distinct monophyletic groups: one consisting of the genera Sphaerochaeta, Spirochaeta, and Treponema and another consisting of the genera Borrelia and Cristispira.

TAXONOMIC IMPLICATIONS
The results presented here show that the main lineages of the phylum Spirochaetes are evolutionarily distinct. The families Spirochaetaceae, Brachyspiraceae, and Leptospiraceae are distinguished from each other and all other bacteria by large numbers of identified CSIs in widely distributed proteins. Additionally, these three families branch distinctly in both 16S rRNA based and concatenated protein based phylogenetic trees. The results presented here also show that the family Spirochaetaceae consists of two distinct monophyletic groups. The distinctiveness of these groups is supported by both molecular evidence, in the form of the large numbers of discovered CSIs, and phylogenetic analyses. Additionally, both of these distinct groups exhibit a large amount of phylogenetic diversity which is currently not reflected in their taxonomy. The current taxonomic organization of the phylum Spirochaetes places all of the main lineages (viz. Spirochaetaceae, Brachyspiraceae, Leptospiraceae, and Brevinemataceae) into a single order. However, to adequately recognize both distinctiveness of the main lineages within the phylum Spirochaetes and the distinctiveness and diversity of the two main groups within the family Spirochaetaceae, the main lineages of the phylum Spirochaetes would have to have their taxonomic rank increased. To recognize the distinctiveness of both the main lineages within the phylum Spirochaetes and the two main groups within the family Spirochaetaceae we are proposing a taxonomic rearrangement of the phylum as follows: We propose that the family Leptospiraceae be transferred to the novel order Leptospiriales ord. nov. within the class Spirochaetia, the family Brachyspiraceae be transferred to the novel order Brachyspiriales ord. nov. within the class Spirochaetia, the family Brevinemataceae be transferred to the novel order Brevinematales ord. nov. within the class Spirochaetia, and that the genera Borrelia and Cristispira be transferred to the novel family Borreliaceae fam. nov. within the order Spirochaetales (Figure 8). The emended descriptions of the order Spirochaetales and the family Spirochaetaceae, as well as a description of the new taxonomic groups Leptospiriales ord. nov., Brachyspiriales ord. nov., Brevinematales ord. nov., and Borreliaceae fam. nov. are provided below.

EMENDED DESCRIPTION OF THE ORDER Spirochaetales (BUCHANAN, 1917)
The order contains two families, Spirochaetaceae and Borreliaceae, of which Spirochaetaceae is the type family. Organisms are helical or coccoid, 0.1-75 μm in diameter and 3.5-250 μm in length. Cells do not have hooked ends. Cells may possess flagella. Periplasmic flagella overlap in the central region of the cell. The diamino acid component of the peptidoglycan is Lornithine. Anaerobic, facultatively anaerobic, or microaerophilic. Organisms are Chemo-organotrophic and utilize carbohydrates or amino acids as carbon and energy sources. Both free living and host associated members. The G + C content of the DNA is 27-66 (mol%). The type genus is Spirochaeta (Ehrenberg, 1835).
Organisms from this order are distinguished from all other Bacteria by the conserved signature indels (CSIs) described in this report in the following proteins: Alanyl-tRNA synthetase, Phosphoribosylpyrophosphate synthetase, SecY preprotein translocase, peptide chain release factor 2, DNA mismatch repair protein MutS, and DNA mismatch repair protein MutL. The family contains seven genera, Clevelandina, Diplocalyx, Hollandina, Pillotina, Sphaerochaeta, Spirochaeta, and Treponema of which Spirochaeta is the type genus. Organisms are helical or coccoid, 0.1-75 μm in diameter and 5-250 μm in length. Cells do not have hooked ends. Cells may possess flagella. Periplasmic flagella overlap in the central region of the cell. Cells can be anaerobic or facultatively anaerobic. The diamino acid component of the peptidoglycan is L-ornithine. Organisms are chemoorganotrophic and utilize carbohydrates or amino acids as carbon and energy sources. Both free living and host associated members. The G + C content of the DNA is 36-66 (mol%).

EMENDED DESCRIPTION OF THE FAMILY
Organisms from this family are distinguished from all other bacteria by the CSIs described in this report in the following proteins: 6-phosphofructokinase (pyrophosphate), bifunctional Hpr kinase/phosphatase, and 30S ribosomal protein S13.
The family contains two genera, Borrelia and Cristispira of which Borrelia is the type genus. Organisms are helical, 0.2-3 μm in diameter and 3-180 μm in length. Cells do not have hooked ends. Periplasmic flagella overlap in the central region of the cell. Cells are motile, host-associated, and microaerophilic. The diamino acid component of the peptidoglycan is L-ornithine. Organisms are chemo-organotrophic and utilize carbohydrates or amino acids as carbon and energy sources. The G + C content of the DNA is 27-32 (mol%).
The order contains the type family Brachyspiraceae. Organisms are helical, 0.2-0.4 μm in diameter and 2-11 μm in length. Cell ends may be blunt or pointed and do not have hooked ends. Periplasmic flagella overlap in the central region of the cell. Cells are motile, host-associated, and obligately anaerobic and aerotolerant. The diamino acid component of the peptidoglycan is L-ornithine. Organisms are Chemo-organotrophic and utilize monosaccharides, disaccharides, the trisaccharide trehalose, and amino sugars as carbon and energy sources. The G + C content of the DNA is 24-28(mol%). The type genus is Brachyspira (Hovind-Hougen et al., 1982).
Organisms from this order are distinguished from all other bacteria by the CSIs described in this report in the following proteins: Flagellar hook-associated protein FlgK, DNA polymerase I, Valyl-tRNA synthetase, ATP-dependent protease La, and Glutamyl-tRNA amidotransferase subunit B. The description of the family Brachyspiraceae is the same as that of the order Brachyspiriales.
The description of the order is the same as the description of the type family, Brevinemataceae.
The order contains the type family Leptospiraceae. Organisms are helical, 0.1-0.3 μm in diameter and 2-11 μm in length. Cell have hooked ends. Periplasmic flagella do not overlap in the central region of the cell. Cells are motile. The diamino acid component of the peptidoglycan is α,ε-diaminopimelic acid. Obligately aerobic or microaerophilic. Organisms are Chemo-organotrophic and long-chain fatty acids or long-chain fatty alcohols as carbon and energy sources. Both free living and host associated members. The G + C content of the DNA is 33-55 (mol%). The type genus is Leptospira (Noguchi, 1917).
Organisms from this order are distinguished from all other Bacteria by the CSIs described in this report in the following proteins: 50S Ribosomal protein L14, 30S Ribosomal protein S2, Alanyl-tRNA synthetase, Flagellar basal-body rod protein FlgG, and Flagellar filament core protein FlaB. The description of the family Leptospiraceae is the same as that of the order Leptospiriales.

ACKNOWLEDGMENTS
This work was supported by a MRI-ORF Water Round research grant.