Molecular signatures and phylogenomic analysis of the genus Burkholderia: proposal for division of this genus into the emended genus Burkholderia containing pathogenic organisms and a new genus Paraburkholderia gen. nov. harboring environmental species

The genus Burkholderia contains large number of diverse species which include many clinically important organisms, phytopathogens, as well as environmental species. However, currently, there is a paucity of biochemical or molecular characteristics which can reliably distinguish different groups of Burkholderia species. We report here the results of detailed phylogenetic and comparative genomic analyses of 45 sequenced species of the genus Burkholderia. In phylogenetic trees based upon concatenated sequences for 21 conserved proteins as well as 16S rRNA gene sequence based trees, members of the genus Burkholderia grouped into two major clades. Within these main clades a number of smaller clades including those corresponding to the clinically important Burkholderia cepacia complex (BCC) and the Burkholderia pseudomallei groups were also clearly distinguished. Our comparative analysis of protein sequences from Burkholderia spp. has identified 42 highly specific molecular markers in the form of conserved sequence indels (CSIs) that are uniquely found in a number of well-defined groups of Burkholderia spp. Six of these CSIs are specific for a group of Burkholderia spp. (referred to as Clade I in this work) which contains all clinically relevant members of the genus (viz. the BCC and the B. pseudomallei group) as well as the phytopathogenic Burkholderia spp. The second main clade (Clade II), which is composed of environmental Burkholderia species, is also distinguished by 2 identified CSIs that are specific for this group. Additionally, our work has also identified multiple CSIs that serve to clearly demarcate a number of smaller groups of Burkholderia spp. including 3 CSIs that are specific for the B. cepacia complex, 4 CSIs that are uniquely found in the B. pseudomallei group, 5 CSIs that are specific for the phytopathogenic Burkholderia spp. and 22 other CSI that distinguish two groups within Clade II. The described molecular markers provide highly specific means for the demarcation of different groups of Burkholderia spp. and they also offer novel and useful targets for the development of diagnostic assays for the clinically important members of the BCC or the pseudomallei groups. Based upon the results of phylogenetic analyses, the identified CSIs and the pathogenicity profile of Burkholderia species, we are proposing a division of the genus Burkholderia into two genera. In this new proposal, the emended genus Burkholderia will correspond to the Clade I and it will contain only the clinically relevant and phytopathogenic Burkholderia species. All other Burkholderia spp., which are primarily environmental, will be transferred to a new genus Paraburkholderia gen. nov.


INTRODUCTION
The genus Burkholderia is a morphologically, metabolically, and ecologically diverse group of gram-negative bacteria (Yabuuchi et al., 1992;Coenye and Vandamme, 2003;Mahenthiralingam et al., 2005;Palleroni, 2005;Compant et al., 2008). Burkholderia species are ubiquitous in the environment (Coenye and Vandamme, 2003). They inhabit a wide range of ecological niches, ranging from soil to the human respiratory tract (Coenye and Vandamme, 2003). A group of 17 closely related Burkholderia species, the Burkholderia cepacia complex (BCC), are responsible for prevalent and potentially lethal pulmonary infections in immunocompromised individuals, such as individuals with cystic fibrosis (Mahenthiralingam et al., 2002(Mahenthiralingam et al., , 2005Biddick et al., 2003;Hauser et al., 2011). Burkholderia pseudomallei, a Burkholderia species related to the BCC, is the causative agent for the disease melioidosis, a potentially lethal septic infection which accounts for up to 20% of all community-acquired septicemias in some regions (White, 2003;Limmathurotsakul and Peacock, 2011). Other species related to the BCC are the causative agents of major infections in both animals (Burkholderia mallei) and plants (Burkholderia glumae and Burkholderia gladioli) (Whitlock et al., 2007;Nandakumar et al., 2009).
In spite of the large diversity and varied pathogenicity among the >70 members of the group, all Burkholderia species are currently placed within one genus (Coenye and Vandamme, 2003;Palleroni, 2005). The phylogeny and taxonomy of the genus Burkholderia is primarily defined on the basis of 16S rRNA sequence analysis (Yabuuchi et al., 1992;Palleroni, 2005;Yarza et al., 2008). The inferences obtained from 16S rRNA analysis have been further substantiated by other phylogenetic methods, including recA gene based analysis (Payne et al., 2005), acdS gene based analysis (Onofre-Lemus et al., 2009), DNA-DNA hybridization (Gillis et al., 1995), whole cell fatty acid analysis (Stead, 1992), multilocus sequence analysis (Tayeb et al., 2008;Spilker et al., 2009;Estrada-de los Santos et al., 2013), gene gain/loss analysis (Zhu et al., 2011), and whole genome phylogenetic analysis (Ussery et al., 2009;Segata et al., 2013). In many of these phylogenetic studies, the members of the genus Burkholderia can be divided into two or more distinct phylogenetic groups, with one group consisting of members of the BCC and related species (Payne et al., 2005;Tayeb et al., 2008;Yarza et al., 2008;Spilker et al., 2009;Ussery et al., 2009;Gyaneshwar et al., 2011;Vandamme and Dawyndt, 2011;Zhu et al., 2011;Estrada-de los Santos et al., 2013;Segata et al., 2013). Although there are some commonly shared features among closely related groups of Burkholderia species, there is no known morphological, biochemical, or molecular characteristic specific to the larger phylogenetic groups within the genus (ex. the BCC and related species).
The advent of next generation sequencing methods has led to a rapid increase in the number of genome sequences available for bacterial species (Mardis, 2008). The availability of these sequences for members of the genus Burkholderia provides us better means to evaluate the phylogenetic relationships among different species (Ciccarelli et al., 2006;Wu et al., 2009). Importantly, the large data sets of sequences allows for the use of comparative genomic techniques to discover novel molecular markers that can provide independent evidence for different phylogenetic groups within the genus Burkholderia (Gupta, 1998(Gupta, , 2014Gao and Gupta, 2012). In this work, we describe one type of molecular marker, conserved sequence insertions or deletions (CSIs), which are uniquely present in protein sequences from a defined group of organisms, that can be used to delineate different phylogenetic groups of Burkholderia species independently of traditional phylogenetic methods (Gupta, 1998(Gupta, , 2001Gao and Gupta, 2012). Our comparative analysis of Burkholderia genomes has led to the identification of 42 unique CSIs that delineate different phylogenetic groups within the genus in clear molecular terms. A clade of Burkholderia containing the BCC and related organisms (Clade I) was supported by both phylogenetic evidence and 6 identified CSIs. We have also identified 3 CSIs specific for the BCC, 4 CSIs specific for the B. pseudomallei group, and 5 CSIs specific for the plant pathogenic Burkholderia spp. The remaining members of the genus Burkholderia formed another monophyletic clade (Clade II) in our phylogenetic trees which was supported by 2 CSIs. Within Clade II, we identified two smaller clades of Burkholderia that were supported by 16 and 6 CSIs. The grouping of members of the genus Burkholderia into at least two large, monophyletic groups has also been observed in a large body of prior phylogenetic research (Payne et al., 2005;Tayeb et al., 2008;Yarza et al., 2008;Spilker et al., 2009;Ussery et al., 2009;Gyaneshwar et al., 2011;Zhu et al., 2011;Estrada-de los Santos et al., 2013;Segata et al., 2013). Based on the phylogenetic evidence and our identified CSIs, we propose division of the genus Burkholderia into two genera: an emended genus Burkholderia containing clinically important and phytopathogenic members of the genus and a new genus Paraburkholderia gen. nov. harboring the environmental species.

PHYLOGENETIC ANALYSIS
A concatenated sequence alignment of 21 highly conserved proteins (viz. ArgRS, EF-G, GyrA, GyrB, Hsp60, Hsp70, IleRS, RecA, RpoB, RpoC, SecY, ThrRS, TrpS, UvrD, ValRS, 50S ribosomal proteins L1, L5 and L6, and 30S ribosomal proteins S2, S8 and S11) was used to perform phylogenetic analysis. Due to their presence in most bacteria, these proteins have been extensively utilized for phylogenetic studies (Gupta, 1998(Gupta, , 2009Kyrpides et al., 1999;Harris et al., 2003;Charlebois and Doolittle, 2004;Ciccarelli et al., 2006). The amino acid sequences for these conserved proteins were obtained from NCBI database for all of the species/strains listed in Table 1, which includes 45 sequenced species of the genus Burkholderia. Furthermore, three genomes from other members of class Betaproteobacteria (viz. Cupriavidus necator N-1, Bordetella pertussis Tohama I, and Neisseria meningitides MC58), serving as outgroups in our analysis, were also retrieved from NCBI database. Depending on genome availability, type strains were selected for most of the species. Multiple sequence alignments for these proteins were created using Clustal_X 1.83 and concatenated into a single alignment file (Jeanmougin et al., 1998). Poorly aligned regions from the alignment file were removed using Gblocks 0.91b and the resulting alignment, which contained 7688 aligned characters, was ultimately utilized for phylogenetic analysis (Castresana, 2000). A maximum likelihood (ML) tree based on 100 bootstrap replicates of this alignment was constructed using MEGA 6.0 while employing Jones-Taylor-Thornton substitution model (Jones et al., 1992;Tamura et al., 2013).
A maximum likelihood 16S rRNA gene sequence consensus tree was also created for 101 sequences, which included 97 representative strains from the genus Burkholderia and four outgroup sequences from the genera Cupriadivus and Ralstonia. The sequences utilized in the study were obtained from the Ribosomal Database Project (RDP III) (Cole et al., 2009) and NCBI. All the sequences were aligned using MAAFT 7 (Katoh and Standley, 2013) and a ML tree based upon 1000 bootstrap replicates of this alignment was constructed using the General Time Reversible Model (Tavaré, 1986) in MEGA 6.0 (Tamura et al., 2013).

IDENTIFICATION OF MOLECULAR MARKERS (CSIs)
BLASTp searches were conducted for all proteins from chromosomes 2 and 3 (accession numbers NC_008061 and NC_008061) of Burkholderia cenocepacia J2315 (Holden et al., 2009) to identify CSIs that are shared by different members of the genus Burkholderia. Species that appeared as top hits with high scoring homologs (E values < 1e −20 ) from the genus Burkholderia and other outgroups were selected. Multiple sequence alignments were created using the Clustal_X 1.83 (Jeanmougin et al., 1998). These alignments were visually inspected for the presence of insertions or deletions (indels) restricted to either some or all members of the genus Burkholderia and flanked by at least 5-6 conserved amino acid residues on both sides in the neighboring 30-40 amino acids. Indel queries that were not flanked by conserved regions were not further evaluated. The species specificity of the indel queries meeting the above criterion was further evaluated by performing BLASTp searches on short sequence segments containing the insertions or deletions, and their flanking conserved regions (60-100 amino acids long). The searches were conducted against the NCBI non-redundant (nr) database and a minimum of 250 BLAST hits were examined for the presence or absence of CSIs. The results of these analyses were evaluated as described in detail in our recent work (Gupta, 2014). Signature files for the CSIs that were specific for members of the genus Burkholderia were created and formatted using the programs SIG_CREATE and SIG_STYLE (accessible from Gleans.net) as described by Gupta (2014). The sequence alignment files presented here contain information for all detected insertions or deletions from the Burkholderia group of interest, but only a limited number from species that are serving as outgroups. Sequence information for different strains of various species is not shown, but they all exhibited similar pattern. Lastly, unless otherwise indicated, the CSIs shown here are specifically found in the indicated groups and similar CSIs were not detected in the 250 Blast hits with the query sequences.

BRANCHING PATTERN OF BURKHOLDERIA SPECIES IN CONCATENATED PROTEIN AND 16S rRNA TREES
Genome sequences of 45 species of Burkholderia were available from the NCBI genome database at the time of this work (NCBI, 2014). Some characteristics of these genomes are listed in Table 1.
The genome sizes of the sequenced Burkholderia species show large variation (from 3.75-11.29 Mb) and the numbers of proteins in them also varied in a similar proportion. In this work we have produced a ML phylogenetic tree based on the concatenated amino acid sequences of 21 conserved housekeeping and ribosomal proteins obtained from 45 sequenced Burkholderia species (Figure 1). The Burkholderia species formed two large clades in the protein based ML tree: One consisting of the BCC and related organisms (Clade I) and another comprised mainly of environmental or poorly characterized Burkholderia species (Clade II). Within Clade I, three smaller, distinct clades are also observed. The first of these clades (Clade Ia) is wholly comprised of the sequenced BCC species, the second clade (Clade Ib) groups B. pseudomallei and closely related species, and the third clade (Clade Ic) consists of the plant pathogenic species, B. glumae and B. gladioli. Clade II could also be divided into two smaller clades, Clade IIa and Clade IIb. Clade IIa is separated from Clade IIb by a long branch, suggesting that a large amount of genetic divergence has occurred between the two groups. In addition to the two main clades of Burkholderia, two species, Burkholderia sp. JPY347 and Burkholderia rhizoxinica, branched early in the tree and did not associate with either Clade I or II.
We have also constructed a 16S rRNA based ML phylogenetic tree for 97 Burkholderia strains and candidate species (Figure 2). In this 16S rRNA based phylogenetic tree we observed broadly similar patterns to our protein based phylogeny. A clade consisting of the BCC and related organisms (Clade I) was clearly resolved. The three subclades within Clade I, the BCC (Clade Ia), the B. pseudomallei group (Clade Ib), and the plant pathogenic species (Clade Ic) were well resolved, though some species exhibited aberrant branching (ex. B. oklahomensis and B. pseudomultivorans). A large assemblage of the remaining Burkholderia species, roughly corresponding to Clade II in our concatenated protein based phylogenetic tree, was also observed in the 16S rRNA tree. However, due to significant number of unsequenced Burkholderia species which are present in the 16S rRNA database it is difficult to accurately identify the groups within Clade II of the 16S rRNA tree which correspond to Clades IIa and IIb in our concatenated protein based phylogenetic tree. Bootstrap support for branches in the 16S rRNA based tree were also significantly lower than they were in the concatenated protein tree indicating that some of the observed branching patterns may not be reliable. However, the clade consisting of the BCC and related organisms (Clade I) has strong bootstrap support and has been identified in a large number of previous 16S rRNA based phylogenetic studies (Yabuuchi et al., 1992;Palleroni, 2005;Yarza et al., 2008;Suarez-Moreno et al., 2012).

MOLECULAR SIGNATURES DISTINGUISHING THE CLADE I AND CLADE II BURKHOLDERIA
Rare genetic changes, such as insertions and deletions in essential genes/proteins, which occur in a common ancestor can be inherited by the various decedent species related to this common ancestor (Gupta, 1998;Rokas and Holland, 2000;Gogarten et al., 2002;Gupta and Griffiths, 2002). Due to the rarity and the specific presence of these rare genetic changes to a related group of organisms, they can serve as important molecular markers and provide a novel means to understand the evolutionary interrelationships between different closely related species (Gupta, 1998;Gupta and Griffiths, 2002;Gao and Gupta, 2012).
The comparative analysis of protein sequences from Burkholderia species that was carried out in the present work has identified a number of CSIs that serve to clearly distinguish a number of different clades within the genus Burkholderia. These studies have led to identification of 6 CSIs that are specific for the Clade I Burkholderia, consisting of the BCC and related organisms, enabling clear distinction of this group from all other Burkholderia. This clade, which contains all well characterized pathogens within the genus, represents the most clinically relevant group within the Burkholderia. All species within this clade are potentially pathogenic to human, animals, or plants and most have been isolated from clinical human samples (Simpson et al., 1994;Mahenthiralingam et al., 2002Mahenthiralingam et al., , 2005Biddick et al., 2003;O'Carroll et al., 2003). One example of a CSI that is specific to the Clade I Burkholderia is shown in Figure 3A. In this case, a one amino acid deletion is present in a highly conserved region of a periplasmic amino acid-binding protein. The indel is flanked on both sides by highly conserved regions indicating that it is not the result of alignment artifacts and that it is a reliable genetic characteristic. This CSI is present in all of the sequenced members of the Clade I Burkholderia, but absent in all other bacterial homologs of this protein. Our work has identified 5 additional CSIs in other widely distributed proteins that are  Table 2. www.frontiersin.org December 2014 | Volume 5 | Article 429 | 7 specific for the Clade I Burkholderia and sequence alignments for these CSIs are shown in Supplemental Figures 1-5 and a summary of their characteristics is provided in Table 2. Two additional CSIs identified in this work are specific for the Clade II Burkholderia species which is made up of mainly environmental organisms. One of these CSIs, shown in Figure 3B, consists of a one amino acid insertion in a dehydrogenase protein that is uniquely found in members of the Clade II Burkholderia and absent in all other Burkholderia species as well all other bacterial groups. A sequence alignment for another CSI that is specific for the Clade II Burkholderia (a 2 aa deletion in a LysR family of transcription regulator protein) is shown in Supplemental Figure  6 and its characteristics are summarized in Table 2.

CSIs DISTINGUISHING DIFFERENT MAIN GROUPS WITHIN THE CLADE I BURKHOLDERIA
The species within Clade I of the genus Burkholderia are responsible for a range of human, animal, and plant diseases (Biddick et al., 2003;Mahenthiralingam et al., 2005). The members of Clade I (i.e., the BCC and related Burkholderia) are commonly separated into 3 main groups which correspond to clades identified in our phylogenetic trees. The first group, the members of the BCC (Clade 1a), are prevalent pathogens in cystic fibrosis patients, the second group, the B. pseudomallei group (Clade Ib), contains the causative agents of melioidosis and glanders, while the third group contains the plant pathogenic Burkholderia species (Clade Ic) (White, 2003;Mahenthiralingam et al., 2005;Whitlock et al., 2007;Nandakumar et al., 2009). Our analysis has identified 3 CSIs that are specific for all members of the BCC clade (Clade 1a). One example of a BCC clade specific CSI is shown in Figure 4A. This CSI consists of a 2 amino acid insertion in a conserved region of a histidine utilization repressor which is only found in members of the BCC. Sequence alignments for two other BCC clade specific CSIs are shown in Supplemental  Figures 7, 8 and their characteristics are summarized in Table 3.
Our work has also identified 4 CSIs that are specific for the B. pseudomallei group (Clade Ib) which contains the most prevalent human pathogen within the genus, B. pseudomallei (Wiersinga et al., 2006). One example of a CSI specific to the B. pseudomallei group, which consists of a 1 amino acid insertion in a conserved region of a periplasmic oligopeptide-binding protein, is shown in Figure 4B. Sequence alignments for three other CSIs in three different proteins that are specific for the B. pseudomallei group are shown in Supplemental Figures 9-11 and their characteristics are summarized in Table 3.
We have also identified 5 CSIs that are specific for the major plant pathogenic group within the genus Burkholderia (Clade 1c) which contains the species B. glumae and B. gladioli. An example of a CSI representing this group is shown in Figure 4C. This CSI consists of a 1 amino acid insertion in a conserved region of a SMP-30/gluconolaconase/LRE-like region-containing protein that is found in the members of Clade 1c of the genus Burkholderia but absent in all other Burkholderia and all other bacterial groups. Sequence alignments for the other 4 CSIs are shown in Supplemental Figures 12-15 and their key features are highlighted in Table 3.

CSIs THAT ARE SPECIFIC FOR TWO GROUPS WITHIN THE CLADE II BURKHOLDERIA
The species within Clade II of the genus Burkholderia inhabit a variety of environmental niches, but there is little evidence of their colonization of healthy or immunocompromised human patients (Coenye and Vandamme, 2003). The branching of different groups within Clade II is not well resolved in 16S rRNA trees and there is currently a lack of sequence data that can be used to generate trees based on concatenated gene sets that reliably resolve the interrelationships of the clade while sufficiently reflecting the total diversity of species within the clade (Figures 1, 2) (Cole et al., 2009;NCBI, 2014). Despite the limited sequence data, we have been able to identify two robust groups within Clade II that are supported by a number of CSIs. The first Clade, Clade IIa, primarily consists of unclassified members of the genus and candidatus Burkholderia species (Figure 1). Clade IIa is supported by 16 CSIs identified in this work. One example of a CSI specific for Clade IIa, consisting of a 1 amino acid insertion in 3-phosphoglycerate dehydrogenase, is shown in Figure 5A. This insertion is present in a highly conserved region of this protein in all sequenced members of Clade IIa and absent in all other Burkholderia and all other bacterial groups. Sequence alignments for the other 15 CSIs that are specific for Clade IIa Burkholderia spp. are shown in Supplemental Figures 16-30 and their characteristics are summarized in Table 3.

FIGURE 4 | Partial sequence alignments of (A) a histidine utilization repressor showing a 2 amino acid insertion (boxed) identified in all members of the Burkholderia cepacia complex (Clade Ia) within the genus Burkholderia (B) a periplasmic oligopeptide-binding protein showing a 1 amino acid insertion (boxed) identified in all members of the Burkholderia pseudomallei group (Clade Ib) within the genus Burkholderia (C) a SMP-30/gluconolaconase/LRE-like region-containing protein showing a 1 amino acid insertion (boxed) identified in all members of the phytopathogenic Burkholderia clade (Clade Ic).
These CSIs were not found in the sequence homologs of these proteins from any other sequenced bacteria in the top 250 BLAST hits. Sequence information for other CSIs specific to subclades within Clade I of the genus Burkholderia are presented in Supplemental Figures 7-15 and their characteristics are summarized in Table 3.
The second group within Clade II of the Burkholderia (Clade IIb), is comprised of a large variety of environmental Burkholderia species (Coenye and Vandamme, 2003;Suarez-Moreno et al., 2012). Our analysis has identified 6 CSIs that are specific to this large group of Burkholderia species. One example of a CSI specific to the members of Clade IIb of the genus Burkholderia is shown in Figure 5B. The CSI consists of a one amino acid insertion in 4-hydroxyacetophenone monooxygenase, which is only present in members of Clade IIb of the genus Burkholderia and not in protein homologs from any other sequenced bacterial group. Information for other 5 CSIs which are specific to members of Clade IIb of the genus Burkholderia are shown in Supplemental  Figures 31-35 and their characteristics are summarized in Table 3.

DISCUSSION
The genus Burkholderia is one of the largest groups of species within the class Betaproteobacteria (Palleroni, 2005;Parte, 2013). The genus contains a variety of bacteria that inhabit a wide range of ecological niches including a number of bacteria that have pathogenic potential (Yabuuchi et al., 1992;Coenye and Vandamme, 2003;Mahenthiralingam et al., 2005;Palleroni, 2005;Compant et al., 2008). The phylogeny of the genus Burkholderia has been studied using a wide array of methodologies based on phenotypic, biochemical, genetic, and genomic characteristics (Stead, 1992;Gillis et al., 1995;Payne et al., 2005;Tayeb et al., 2008;Onofre-Lemus et al., 2009;Spilker et al., 2009;Ussery et al., 2009;Gyaneshwar et al., 2011;Vandamme and Dawyndt, 2011;Zhu et al., 2011;Estrada-de los Santos et al., 2013). These studies have provided novel insights into the evolutionary relationship of the species within the genus Burkholderia. However, no taxonomic changes have been made to date due to a lack of discrete, distinguishing characteristics identified for the different phylogenetic lineages within the genus (Estrada-de los Santos et al., 2013).
In the present work, we have outlined two major groups of species within the genus Burkholderia: Clade I, which contains all pathogenic members of the genus, and Clade II, which contains a large variety of environmental species. These two groups were found to branch distinctly in a highly resolved phylogenetic tree based on a large number of concatenated protein sequences produced in this work (Figure 1). Evidence for the distinctness of Clade I organisms from other Burkholderia species has been observed in a wide range of previous phylogenetic studies (Payne et al., 2005;Tayeb et al., 2008;Yarza et al., 2008;Spilker et al., 2009;Ussery et al., 2009;Gyaneshwar et al., 2011;Vandamme and Dawyndt, 2011;Zhu et al., 2011;Suarez-Moreno et al., 2012;Estrada-de los Santos et al., 2013;Segata et al., 2013). Importantly, we have also identified 6 and 2 CSIs that serve as discrete molecular characteristics of Clade I and Clade II, respectively (Figure 6 and Table 2). These CSIs are the

Frontiers in Genetics | Evolutionary and Genomic Microbiology
December 2014 | Volume 5 | Article 429 | 10  first discrete features that have been identified that are unique to either Clade I or Clade II of the genus Burkholderia. These CSIs act as independent verification of the phylogenetic trends identified in this and other studies and provide clear evidence that the species from the Clade I are distinct from all other Burkholderia and that they are derived from a common ancestor exclusive of all other Burkholderia. Although sequence information for Clade II members is at present somewhat limited, based upon the shared presence of two CSIs by them, it is likely that they are also derived from a common ancestor exclusive of other bacteria. Additionally, we have identified molecular evidence, in the form of large numbers of CSIs, which support the distinctiveness of several smaller groups within the genus Burkholderia. The most important of these groups, the B. cepacia complex (BCC; Clade Ia) and the B. pseudomallei group (Clade Ib), are supported by the 3 and 4 of the identified CSIs, respectively. The BCC are a group of opportunistic pathogens which colonize immunodificient human hosts and are among the most prevalent and lethal infections in cystic fibrosis patients (Mahenthiralingam et al., 2002(Mahenthiralingam et al., , 2005Biddick et al., 2003;Hauser et al., 2011). The 17 species that make up the BCC are closely related and form a tight monophyletic cluster within the genus Burkholderia (Vandamme and Dawyndt, 2011). The B. pseudomallei group consists of 4 closely related species: B. pseudomallei, the causative agent of the highly lethal septicemia melioidosis (White, 2003;Limmathurotsakul and Peacock, 2011), B. mallei, the causative agent of the equine disease glanders and occasional human infections (Whitlock et al., 2007), and the largely non-pathogenic organisms, Burkholderia thailandensis and Burkholderia oklahomensis (Deshazer, 2007). The identified CSIs are highly specific characteristics of these two important pathogenic groups and they www.frontiersin.org December 2014 | Volume 5 | Article 429 | 11

FIGURE 5 | Partial sequence alignments of (A) 3-phosphoglycerate dehydrogenase showing a 1 amino acid insertion (boxed) identified in all members of Clade IIa of the genus Burkholderia (B) 4-hydroxyacetophenone monooxygenase showing a 1 amino acid insertion (boxed) identified only in members of Clade IIb of the genus
Burkholderia. These CSIs were not found in the sequence homologs of these proteins from any other sequenced bacteria in the top 250 BLAST hits. Sequence information for other CSIs specific to subclades within Clade II of the genus Burkholderia are presented in Supplemental Figures 16-35 and their characteristics are summarized in Table 3.

FIGURE 6 | A summary diagram depicting the distribution of identified CSIs and the proposed names of the two major groups (Clade I and II) within
Burkholderia. The major Burkholderia clades are indicated by brackets and highlighting.
provide novel and useful targets for the development of diagnostic assays for either the BCC or the B. pseudomallei group (Ahmod et al., 2011;Wong et al., 2014). We have identified CSIs for three other groups within the genus Burkholderia: A group of plant pathogenic Burkholderia related to the BCC and B. pseudomallei group (Clade Ic), a group containing unnamed and candidate Burkholderia species (Clade IIa), and a group consisting of environmental Burkholderia (Clade IIb). We have identified 6, 16, and 6 CSIs for these three groups, respectively. These CSIs provide important differentiating characteristics for these groups, particularly for Clades IIa and IIb which are related groups that have no other identified differentiating characteristics (Suarez-Moreno et al., 2012).
The phylogenetic analyses, identified CSIs, and the pathogenic characteristics of the different Burkholderia species presented in this work strongly suggest that the genus Burkholderia is made up of at least two distinct lineages. One lineage consisting of the BCC and related organisms (Clade I) and another consisting of a wide range of environmental organisms (Clade II). This latter clade is phylogenetically highly diverse and there is a paucity of sequence information available for its members. Thus, it is possible that in future this latter clade may be found to consist of more than one distinct bacterial lineage, however, it is currently clear that Clade I and Clade II represent distinct lineages. Evidence for the distinctness of the Clade I members from other Burkholderia species has been identified in www.frontiersin.org December 2014 | Volume 5 | Article 429 | 13     a number of previous phylogenetic studies (Payne et al., 2005;Tayeb et al., 2008;Yarza et al., 2008;Spilker et al., 2009;Ussery et al., 2009;Gyaneshwar et al., 2011;Vandamme and Dawyndt, 2011;Zhu et al., 2011;Suarez-Moreno et al., 2012;Estradade los Santos et al., 2013;Segata et al., 2013). Estrada-de los Santos et al. (2013) recently completed a phylogenetic analysis of the genus Burkholderia utilizing the multilocus sequence analysis of atpD, gltB, lepA, and recA genes in combination with the 16S rRNA gene, which provides compelling evidence for the presence of two distinct evolutionary lineages within the genus Burkholderia. However, these authors have refrained from formally proposing a division of the genus into two genera due to a paucity of differentiating characteristics for the two groups. Our comparative analysis of Burkholderia genomes has identified a set of distinctive molecular characteristics that clearly differentiate the two evolutionary lineages within the genus Burkholderia in addition the phylogenetic evidence. In light of the abundance of phylogenetic and molecular evidence for the presence of two distinct evolutionary lineages within the genus Burkholderia, and the distinct pathogenicity profiles of the members of these two groups, we are proposing that genus Burkholderia should be divided into two separate genera. The first of these monophyletic genera, which comprises of all the clinically relevant species and clearly distinguished from all other Burkholderia species, will retain the name Burkholderia (Clade I).
For the remainder of the Burkholderia species (Clade II), which include a wide range of environmental species, we propose the name Paraburkholderia gen. nov. An emended description of the genus Burkholderia and a description of Paraburkholderia gen. nov. are provided below. Brief descriptions of the new species combinations within Paraburkholderia gen. nov. are presented in Table 4.

EMENDED DESCRIPTION OF THE GENUS BURKHOLDERIA (Yabuuchi et al., 1993 EMEND. Gillis et al., 1995)
The genus contains the type species B. cepacia . The species from this genus are gram-negative, straight or slightly curved rods, which exhibit motility mediated by one or more polar flagella. Only, B. mallei lacks flagella and is nonmotile. The species do not produce sheaths or prosthecae and do not go through any resting stages. Most species are able to accumulate and utilize poly-β-hydroxybutyrate (PHB) for growth. The species are mostly aerobic chemoorganotrophs, but some species are capable of anaerobic respiration using nitrate as the terminal electron acceptor. The G+C content for the members of the genus ranges from 65.7 to 68.5%. The members of the genus form a distinct monophyletic clade in phylogenetic trees, and they are distinguished from all other bacteria by the conserved sequence indels reported in this work in the following proteins: Periplasmic amino acid-binding protein, 4-hydroxybenzoate 3monooxygenase, 6-phosphogluconate dehydrogenase, Sarcosine oxidase subunit alpha, a putative lipoprotein, and a putative lyase ( Table 2).

DESCRIPTION OF THE GENUS PARABURKHOLDERIA GEN. NOV.
The genus contains the type species Paraburkholderia graminis comb. nov. (Basonym: Burkholderia graminis, Viallard et al., 1998) The species from this genus are gram-negative straight or slightly curved rods with one or more polar flagella. Other morphological and metabolic characteristics are similar to genus Burkholderia. The G+C content for the members of the genus ranges from 61.4 to 65.0%. The species are not associated with humans. The members of this genus generally form a distinct clade in the neighborhood of genus Burkholderia in phylogenetic trees, and they lack the molecular signatures which are specific for Burkholderia. Most of the sequenced members from this genus contain the conserved sequence indels reported in this work in the protein sequences of an unnamed dehydrogenase and a LysR family transcriptional regulator ( Table 2).

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00429/abstract