Comparative Genomics of Pandoraea, a Genus Enriched in Xenobiotic Biodegradation and Metabolism

Comparative analysis of partial gyrB, recA, and gltB gene sequences of 84 Pandoraea reference strains and field isolates revealed several clusters that included no taxonomic reference strains. The gyrB, recA, and gltB phylogenetic trees were used to select 27 strains for whole-genome sequence analysis and for a comparative genomics study that also included 41 publicly available Pandoraea genome sequences. The phylogenomic analyses included a Genome BLAST Distance Phylogeny approach to calculate pairwise digital DNA–DNA hybridization values and their confidence intervals, average nucleotide identity analyses using the OrthoANIu algorithm, and a whole-genome phylogeny reconstruction based on 107 single-copy core genes using bcgTree. These analyses, along with subsequent chemotaxonomic and traditional phenotypic analyses, revealed the presence of 17 novel Pandoraea species among the strains analyzed, and allowed the identification of several unclassified Pandoraea strains reported in the literature. The genus Pandoraea has an open pan genome that includes many orthogroups in the ‘Xenobiotics biodegradation and metabolism’ KEGG pathway, which likely explains the enrichment of these species in polluted soils and participation in the biodegradation of complex organic substances. We propose to formally classify the 17 novel Pandoraea species as P. anapnoica sp. nov. (type strain LMG 31117T = CCUG 73385T), P. anhela sp. nov. (type strain LMG 31108T = CCUG 73386T), P. aquatica sp. nov. (type strain LMG 31011T = CCUG 73384T), P. bronchicola sp. nov. (type strain LMG 20603T = ATCC BAA-110T), P. capi sp. nov. (type strain LMG 20602T = ATCC BAA-109T), P. captiosa sp. nov. (type strain LMG 31118T = CCUG 73387T), P. cepalis sp. nov. (type strain LMG 31106T = CCUG 39680T), P. commovens sp. nov. (type strain LMG 31010T = CCUG 73378T), P. communis sp. nov. (type strain LMG 31110T = CCUG 73383T), P. eparura sp. nov. (type strain LMG 31012T = CCUG 73380T), P. horticolens sp. nov. (type strain LMG 31112T = CCUG 73379T), P. iniqua sp. nov. (type strain LMG 31009T = CCUG 73377T), P. morbifera sp. nov. (type strain LMG 31116T = CCUG 73389T), P. nosoerga sp. nov. (type strain LMG 31109T = CCUG 73390T), P. pneumonica sp. nov. (type strain LMG 31114T = CCUG 73388T), P. soli sp. nov. (type strain LMG 31014T = CCUG 73382T), and P. terrigena sp. nov. (type strain LMG 31013T = CCUG 73381T).


INTRODUCTION
Members of the genus Pandoraea have emerged as rare opportunistic pathogens in persons with cystic fibrosis (Jørgensen et al., 2003;Johnson et al., 2004;Pimentel and MacLeod, 2008;Kokcha et al., 2013;Ambrose et al., 2016;Martina et al., 2017;See-Too et al., 2019) and several cases of chronic colonization and patient-to-patient transfer in this patient group have been reported (Jørgensen et al., 2003;Atkinson et al., 2006;Degand et al., 2015;Pugès et al., 2015;Ambrose et al., 2016;Dupont et al., 2017;Greninger et al., 2017). In addition to causing infection in cystic fibrosis patients, Pandoraea isolates have been recovered from blood and from samples from patients with chronic obstructive pulmonary disease or chronic granulomatous disease (Coenye et al., 2000;Daneshvar et al., 2001). Although the small number of patients involved and underlying diseases make it difficult to identify these bacteria as the cause of clinical deterioration (Martina et al., 2017;Green and Jones, 2018), one report described sepsis, multiple organ failure and death in a non-cystic fibrosis patient who underwent lung transplantation for sarcoidosis (Stryjewski et al., 2003).
The genome sequences of several strains with bioremediation potential have been reported, but a growing number of studies fail to provide species level identification of such strains (Pushiri et al., 2013;Chan et al., 2015;Kumar M. et al., 2016;Crofts et al., 2017;Liu et al., 2018;Wu et al., 2019). In addition, in our studies on the diversity and epidemiology of opportunistic pathogens in persons with cystic fibrosis, we isolated a considerable number of Pandoraea strains that represent novel species (unpublished data). The present study aimed to clarify the taxonomy and formally name these novel Pandoraea species, and to make reference cultures and whole-genome sequences of each of these versatile bacteria publicly available.

Bacterial Strains and Growth Conditions
Isolates representing novel Pandoraea species are listed in Table 1, along with their isolation source details. These strains were initially assigned to the genus Pandoraea on the basis of sequence analysis of 16S rRNA, gyrB or recA genes (data not shown). Well-characterized reference strains and recent field isolates identified in the present study as established Pandoraea species are listed in Supplementary Table S1. Strains were grown aerobically on Tryptone Soya Agar (Oxoid) and incubated at 28 • C. Cultures were preserved in MicroBank TM vials at −80 • C.

DNA Preparation
DNA was extracted using an automated Maxwell R DNA preparation instrument (Promega, United States). The final  extract was treated with RNAse (2 mg/ml, 5 µL per 100 µL extract) and incubated at 37 • C for 1 h. DNA quality was checked using 1% agarose gel electrophoresis and DNA quantification was performed using the QuantiFluor ONE dsDNA system and the Quantus fluorometer (Promega, United States). DNA was stored at −20 • C prior to further analysis.

Single Locus Sequence Analyses
Nearly complete 16S rRNA sequences were obtained as described previously (Peeters et al., 2013). Partial recA gene sequences (663 bp) were amplified by PCR using forward primer 5 -AGG ACG ATT CAT GGA AGA WAG C-3 and reverse primer 5 -GAC GCA CYG AYG MRT AGA ACT T-3 (Spilker et al., 2009). Each 25 µl PCR reaction consisted of 1x PCR buffer (Qiagen), 1 U of Taq polymerase (Qiagen), 250 µM of each dNTP (Applied Biosystems), 1 × Q-solution (Qiagen), 1 µM of each primer and 2 µl of DNA (Peeters et al., 2013). PCR was performed using a Veriti 96 Well Thermal Cycler (Applied Biosystems). Initial denaturation for 2 min at 94 • C was followed by 30 cycles of 30 s at 94 • C, 45 s at 58 • C and 1 min at 72 • C, and a final elongation for 10 min at 72 • C. Amplicons were purified using a NucleoFast 96 PCR clean-up kit (Macherey-Nagel). Sequencing primers (one per sequencing reaction) were the same as the amplification primers. Sequence analysis was performed with an Applied Biosystems 3130xl Genetic Analyzer and protocols of the manufacturer using the BigDye Terminator Cycle Sequencing Ready kit. Sequence assembly was performed using BioNumerics v7.6 (Applied Maths, Belgium).
Partial gyrB sequences (573 bp) were amplified by PCR using forward primer 5 -GAC AAY GGB CGY GGV RTB CC-3 (this study) and reverse primer 5 -YTC GTT GWA RCT GTC GTT CCA CTG C-3 (Spilker et al., 2009). The PCR protocol was the same as for recA, except that 2 µM of primer was used and an annealing temperature of 60 • C. Sequencing primers (one per sequencing reaction) were 5 -ACG ACA AGC ACG ARC CSA AGC G-3 (this study) and the same reverse primer as for amplification. Sequence analysis and assembly were performed as described above for the recA gene.
Partial gltB sequences were amplified by PCR using forward primer 5 -CTG CAT CAT GAT GCG CAA GTG-3 (Spilker et al., 2009) and reverse primer 5 -GTT GCC ACG GAA RTC GTT GG-3 (this study). The PCR protocol was the same as for recA, except that 0.4 µM of primer was used. Sequencing primers (one per sequencing reaction) were the same as the amplification primers. Sequence analysis and assembly were performed as described above for the recA gene.
Gene sequences of recA, gyrB, and gltB were aligned based on their amino acid sequences using Muscle (Edgar, 2004) in MEGA7 . Phylogenetic trees were constructed using RAxML v8.2.11 (Stamatakis, 2014) with the GTRCAT substitution model and 1000 bootstrap analyses. Visualization and annotation of the phylogenetic trees was performed using iTOL (Letunic and Bork, 2016).

Whole-Genome Sequencing
The genome sequences of 27 strains ( Table 2 and Supplementary  Table S2) were determined using the Illumina HiSeq4000 platform (PE150) at the Oxford Genomics Centre. Quality reports were created by FastQC. Reads were trimmed using Trimmomatic (Bolger et al., 2014) with the MAXINFO:50:0.8 and MINLEN:50 options. Genome size was estimated using kmc (Kokot et al., 2017) and reads were subsampled with seqtk 1 to 80x coverage depth for assembly. Assembly was performed using SPAdes v3.12.0 (Bankevich et al., 2012) with error correction, default k-mer sizes (21,33,55,77) and mismatch correction. Contigs were filtered on length (minimum 500 bp) and coverage (minimum 0.5x and maximum 8x overall coverage). Raw reads were mapped against the assemblies using bwa mem (Li, 2013) and contigs were polished using Pilon 1.22 (Walker et al., 2014) with default parameters. Quast (Gurevich et al., 2013) was used to create quality reports of the resulting assemblies. Annotation was performed using Prokka 1.12 (Seemann, 2014) with a genus-specific database based on publicly available genomes.

Publicly Available Genomes
All 41 publicly available (January 29, 2019) whole-genome sequences of Pandoraea bacteria were downloaded from the NCBI database (Table 2). Burkholderia cenocepacia J2315 T was used as an outgroup in the phylogenomic analyses. For strains B-6 , E26 , PE-S2R-1 and PE- S2T-3 (Crofts et al., 2017) no annotation was available and therefore annotation was performed using Prokka as described above.

Phylogenomic Analyses
The GBDP approach was used to calculate pairwise digital DNA-DNA hybridization (dDDH) values and their confidence intervals (formula 2) using the Genome-to-Genome Distance Calculator (GGDC 2.1 2 ) under recommended settings (Meier-Kolthoff et al., 2013). ANI values were calculated with the OrthoANIu algorithm (Yoon et al., 2017). Whole-genome phylogeny was assessed based on 107 single-copy core genes found in a majority of bacteria (Dupont et al., 2012) using bcgTree (Ankenbrand and Keller, 2016). Visualization and annotation of the phylogenetic tree was performed using iTOL (Letunic and Bork, 2016).

Functional Genome Analyses
To enable a comparative genomic study, each protein-coding gene (CDS) in the 68 Pandoraea genomes (n = 331,123) was functionally classified using the COG (Galperin et al., 2015) and KEGG orthologies (Kanehisa and Goto, 2000;Kanehisa et al., 2017). COGs were assigned by a reversed position-specific BLAST (RPSBLAST v2.6.0+) with an e-value cut-off of 1E-3 against the NCBI conserved domain database (CDD v3.16) (Tange, 2011). KEGG orthology was inferred using the KEGG automated annotation server (KAAS) (Moriya et al., 2007). Based on COG and K numbers, each CDS was assigned to the respective COG category and KEGG hierarchy. In case multiple COG categories were defined for the same COG, the first category was considered as the primary category. Protein orthologous groups (orthogroups) were inferred using OrthoFinder v2.2.7 (Emms and Kelly, 2015) with default parameters. For each orthogroup, we mapped the genomes and species in which it was present, the specificity (core, multiple species, single species or single isolate), and COG and KEGG functional classification.
Data mapping, visualization and statistical analyses were performed using RStudio with R v3.5.2. Pearson's chi-square analyses were used to test the association between different sets of categorical variables. When a significant relationship was found between two variables, we further examined the standardized Pearson residuals. Standardized Pearson residuals with high absolute values indicate a lack of fit of the null hypothesis of independence in each cell (Agresti, 2002) and thus indicate observed cell frequencies in the contingency table that are significantly higher or lower than expected based on coincidence.

DNA Base Composition
The G + C content of all strains was calculated from their genome sequences using Quast (Gurevich et al., 2013).

Biochemical Characterization
Biochemical characterization was performed as described previously (Draghi et al., 2014).

Fatty Acid Methyl Ester Analysis
After a 24 h incubation period at 28 • C on Tryptone Soya Agar (BD), a loopful of well-grown cells was harvested and fatty acid methyl esters were prepared, separated and identified using the Microbial Identification System (Microbial ID) as described previously (Vandamme et al., 1992).

Single Locus Sequence Analyses
The 16S rRNA gene sequences determined in the present study are publicly available through the GenBank/EMBL/DDBJ accession numbers listed in the species descriptions. Because the 16S rRNA sequences of Pandoraea species FIGURE 1 | Phylogenetic tree based on partial gyrB sequences of all Pandoraea strains examined. Sequences (495-573 bp) were aligned based on their amino acid sequences and phylogeny was inferred using the Maximum Likelihood method and GTRCAT substitution model in RAxML. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches if greater than 50%. Burkholderia cenocepacia J2315 T was used as outgroup. The scale bar indicates the number of substitutions per site. Isolates selected for whole-genome sequencing are shown in bold character type.
FIGURE 2 | Phylogenetic tree based on 107 single-copy core genes. BcgTree was used to extract the nucleotide sequence of 107 single-copy core genes and to construct their phylogeny by partitioned maximum-likelihood analysis. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches. Burkholderia cenocepacia J2315 T was used as outgroup. Bar, 0.01 changes per nucleotide position.   show high levels of similarity (Coenye et al., 2000;Daneshvar et al., 2001), gyrB gene sequence analysis has been introduced for species level identification of Pandoraea isolates (Coenye and LiPuma, 2002). To provide more robust phylogenetic analysis, partial sequences of the gyrB gene, and also of the recA and gltB genes were generated for a total of 84 Pandoraea reference strains and field isolates, and were used to construct phylogenetic trees (Figure 1 and Supplementary Figures  S1, S2). The gltB, gyrB and gltB sequences determined in the present study are publicly available through the GenBank/EMBL/DDBJ accession numbers listed in Figure 1 and Supplementary Figures S1, S2 and in the species descriptions.
Overall, the three phylogenetic trees had comparable topologies, but while taxonomic reference strains of established Pandoraea species (Supplementary Table S1) and several groups of field isolates formed well-delineated clusters, others did not (Figure 1 and Supplementary Figures S1, S2). Each of these phylogenetic trees was therefore used to select a total of 27 isolates (shown in bold character type in Figure 1 and Supplementary  Figures S1, S2) for whole-genome sequence analysis. These included 6 isolates that were tentatively assigned to established Pandoraea species using single locus sequence analyses, 20 isolates that clustered separately or whose assignment was equivocal, and P. terrae LMG 30175 T , the sole Pandoraea type strain for which there was no publicly available whole-genome sequence at the time of writing. Tr, trace amount (<1%); ND, not detected. * Summed feature 2 comprises iso-C 16:1 I and/or C 14:0 3-OH; summed feature 3 comprises iso-C 15:0 2-OH and/or C 16:1 ω7c.

Genome Characteristics
The assembly of the Illumina HiSeq 150 bp paired end reads resulted in assemblies with 12-113 contigs and a total of 4.86-6.45 Mbp ( Table 2 and Supplementary Table S2). The number of predicted CDS in the newly sequenced genomes ranged from 4,266 to 5,652 ( Table 2). No clustered regularly interspaced short palindromic repeats (CRISPRs) were identified. The annotated assemblies of these 27 genomes were submitted to the European Nucleotide Archive and are publicly available through the GenBank/EMBL/DDBJ accession numbers listed in Table 2 and in the species descriptions. The G + C content of the newly sequenced strains, as calculated from their genome sequences, ranged from 62.3 to 66.1 mol% ( Table 2). These values are similar those of other Pandoraea genomes, except for Ca. Pandoraea novymonadis that has a G + C content of 43.8% (Kostygov et al., 2017).

Phylogenomic Analyses
The 27 genomes from the present study were compared to all 41 publicly available Pandoraea genomes (GenBank database, January 29, 2019), which included 6 unclassified Pandoraea strains ( Pandoraea novymonadis, a total of 17 novel species for which we propose the names shown in Table 1, and a novel species represented by strains PE-S2R-1 and PE-S2T-3 (Crofts et al., 2017) (see below). One of these novel species, i.e., Pandoraea cepalis, corresponds with Pandoraea genomospecies 1, which we reported earlier (Coenye et al., 2000). Two novel species, i.e., Pandoraea capi and Pandoraea bronchicola, correspond with Pandoraea genomospecies 3 and 4, respectively, reported by Daneshvar et al. (2001). Finally, the phylogenomic data (Figure 2 and Supplementary Tables S3, S4), but also each of the single locus sequence analyses, showed that Pandoraea genomospecies 2 LMG 20602 should be classified as P. sputorum, which contradicts earlier wet-lab DNA-DNA hybridization results (Daneshvar et al., 2001). The use of dDDH and ANI threshold levels was generally straightforward, yet some pairs of strains showed values close to the generally applied taxonomic threshold levels (Supplementary Tables S3, S4) (Meier-Kolthoff et al., 2013;Yoon et al., 2017). The two strains classified as P. capi showed 96.4% ANI and 69.6% dDDH with a dDDH confidence interval of 66.6-72.5%, and these strains were therefore classified as the same species. Similarly, the three strains classified as P. cepalis showed 96.2-98.4% ANI, 68.4-86.0% dDDH, and the 70% dDDH threshold level was in the confidence interval; these strains were therefore classified as one species. P. soli LMG 31014 T showed 95.0-95.8% ANI and 60.7-65.0% dDDH toward P. cepalis strains, and the 70% dDDH threshold level was not part of the confidence interval, so this strain was classified as a separate species. Similarly, P. horticolens LMG 31112 T showed 95.0-95.3% ANI and 60.0-62.2% dDDH toward P. communis, and the 70% dDDH threshold level was not part of the confidence interval so this strain was also classified as a separate species.
The phylogenomic tree based on 107 single-copy marker genes was well resolved and the clusters delineated by dDDH and ANI formed monophyletic groups with a high bootstrap support (Figure 2). The clades in the phylogenomic tree of the present study showed a branching order similar to a previously published tree based on 119 conserved proteins (Kostygov et al., 2017). The results of the phylogenomic analyses along with the clustering in the individual recA, gyrB, and gltB single locus sequence analyses (Figure 1 and  Supplementary Figures S1, S2) were used to identify each of the 84 isolates included in the present study. P. sputorum FIGURE 5 | Orthogroup specificity varies among COG categories. Bar plot shows the number of orthogroups and their specificity per COG category [X 2 (66) = 522, p < 0.001]. J, translation, ribosomal structure and biogenesis; K, transcription; L, replication, recombination and repair; B, chromatin structure and dynamics; D, cell cycle control, cell division, chromosome partitioning; V, defense mechanisms; T, signal transduction mechanisms; M, cell wall/membrane/envelope biogenesis; N, cell motility; W, extracellular structures; U, intracellular trafficking, secretion, and vesicular transport; O, posttranslational modification, protein turnover, chaperones; X, mobilome: prophages, transposons; C, energy production and conversion; G, carbohydrate transport and metabolism; E, amino acid transport and metabolism; F, nucleotide transport and metabolism; H, coenzyme transport and metabolism; I, lipid transport and metabolism; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; R, general function prediction only; S, function unknown.
strain LMG 31121 clustered with the remaining P. sputorum strains in the gyrB and gltB trees but grouped aberrantly in the recA tree. In addition, P. cepalis proved particularly difficult to identify through single locus sequence analysis as it exhibited more variation in each of the sequences examined (Figure 1 and Supplementary Figures S1, S2) than any other Pandoraea species.

Phenotypic Characterization
The type strains of each of 11 established Pandoraea species and of 17 novel Pandoraea species reported in the present study were included in an extensive phenotypic characterization. Among Pandoraea species, P. thiooxydans not only occupies a separate phylogenetic position (Figures 1, 2 and Supplementary  Figures S1, S2) but also has a distinctive phenotype (Table 3). While all other Pandoraea species show normal growth on general microbiological growth media (i.e., they generate colonies of 1-4 mm in diameter after 2 days of incubation at 37 • C), P. thiooxydans LMG 24779 T requires prolonged incubation up to 7 days before the same colony size was obtained.

Functional Genome Analyses
The 68 Pandoraea genomes in the present study comprised 331,123 CDS, of which 273,692 (83%) and 128,054 (39%) could be assigned to the COG and KEGG orthologies, respectively (Supplementary Table S5). Orthologous genes were identified to determine the conserved genome content of the genus Pandoraea. Ortholog analysis revealed 10,783 orthogroups (325,879 CDS) in total, of which 738 (51,633 CDS) were present in all genomes, 897 (62,692 CDS) were present in all genomes except Ca. Pandoraea novymonadis, 8,003 (207,937 CDS) were present in multiple species, 1,130 (3,581 CDS) were speciesspecific and 15 (36 CDS) were isolate-specific (Figure 3). For further analyses, the core orthogroups were defined as those present in all genomes or all genomes except Ca. Pandoraea novymonadis (n = 1,635). COG and KEGG could be assigned to 7,243 (67%) and 3,655 (34%) of a total of 10,783 orthogroups (Supplementary Table S6). A previous pan genome analysis of 36 Pandoraea genomes by Wu et al. (2019) revealed a core genome of 1,903 CDS. As shown by these authors, the pan genome of Pandoraea is open (Wu et al., 2019) and the number of core genes decreases with an increasing number of genomes analyzed.
The frequency of orthologous versus non-orthologous CDS varied significantly per isolate [X 2 (67) = 7423, p < 0.001] and species [X 2 (29) = 5863, p < 0.001]. The number of non-orthologous CDS per genome ranged from 0 to 632, with P. terrae LMG 30175 T showing the highest percentage of non-orthologous CDS (Figure 4 and Supplementary  Table S7). To identify biological functions that were overor underrepresented in the core genome, we looked at the COG and KEGG functional classification of the orthogroups versus their specificity (core, multiple species, single species or single isolate). The specificity of the orthogroups varied significantly among the COG categories [X 2 (66) = 522, p < 0.001] and highest levels of the KEGG pathways [X 2 (10) = 130, p < 0.001]. The core orthogroups were significantly enriched in the COG categories Translation, ribosomal structure and biogenesis (J), Posttranslational modification, protein turnover, chaperones (O), Nucleotide transport and metabolism (F) and Coenzyme transport and metabolism (H) (Figure 5 and Supplementary Table S8) and in the KEGG pathway Genetic Information Processing (09120) (Figure 6 and Supplementary Table S9).
Because many Pandoraea strains participate in the biodegradation of recalcitrant xenobiotics (Uhlik et al., 2012;Pushiri et al., 2013;Shi et al., 2013;Wang et al., 2015;Crofts et al., 2017;de Paula et al., 2017;Sarkar et al., 2017;Tirado-Torres et al., 2017;Kumar et al., 2018b;Yang et al., 2018;Liu et al., 2019;Wu et al., 2019), we specifically looked at the orthogroups in the KEGG pathway Xenobiotics biodegradation and metabolism (Figure 7). Most orthogroups in this pathway were present in multiple species (n = 28) and some were even present in the core Pandoraea genome (n = 6). This confirmed the potential of Pandoraea for degrading xenobiotics. In particular, the widespread capacity to utilize benzoate derivatives (Figure 7, pathways 362, 364, 627, and 633) explains why several strains have the potential to degrade lignin (Shi et al., 2013;Kumar et al., 2018a;Liu et al., 2019) and other aromatic compounds (Springael et al., 1996;Uhlik et al., 2012;Wang et al., 2015). Finally, P. fibrosis and P. thiooxydans showed a unique capacity to degrade specific compounds (Figure 7). P. fibrosis was only recently described and named after its origin from a cystic fibrosis patient (See-Too et al., 2019) but its unique capacity to degrade nitrotoluene derivatives is yet another example of the versatility in one Pandoraea species.

CONCLUSION
The present study extends the number of formally named Pandoraea species considerably and makes reference cultures and their whole-genome sequences publicly available. The genus Pandoraea further emerges as a group of environmental bacteria with strong biodegradation capacities and as opportunistic human pathogens, especially in persons with cystic fibrosis. Within this genus, P. thiooxydans and P. terrae and Candidatus P. novymonadis cluster outside the main Pandoraea lineage. The aberrant phylogenomic position of the former is further supported by a distinctive phenotype. The classification of these bacteria within this monophyletic genus could therefore be questioned.
Taking into account the source and identification of strains ISTKB (a rhizospheric soil isolate, Kumar M. et al., 2016) and B-6 (an eroded bamboo slip isolate, Liu et al., 2018), and, to be as comprehensive as possible, also some additional unpublished own data (JL and PV), the novel species P. aquatica, P. capi, P. cepalis, P. commovens, P. communis, and P. iniqua, but also the established species P. faecigallinarum, P. norimbergensis, P. pnomenusa, and P. fibrosis, have all been isolated from both human clinical and environmental sources. Thus far, the novel species P. anapnoica, P. anhela, P. bronchicola, P. captiosa, P. morbifera, P. nosoerga, and P. pneumonica, but also the established species P. apista, P. pulmonicola, and P. sputorum, have all been isolated from human clinical sources only; while the novel species P. eparura, P. horticolens, P. soli and P. terrigena, and the established species P. oxalativorans, P. terrae, P. thiooxydans, and P. vervacti have thus far been isolated from environmental samples only.
The present study provides genomic, chemotaxonomic and phenotypic data that enable a formal proposal of 17 novel Pandoraea species as outlined below. By making reference cultures and whole-genome sequences of each of these versatile bacteria publicly available, we aim to contribute to future knowledge about the metabolic versatility and pathogenicity of these organisms.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States. The type strain is LMG 31117 T (=CCUG 73385 T ) and was isolated from a cystic fibrosis specimen in the United States in 1999. Its G + C content is 62.4 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31117 T are publicly available through the accession numbers LR536847, LR536866-LR536868, and CABPSP010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States.
The type strain is LMG 31108 T (=CCUG 73386 T ) and was isolated from a cystic fibrosis specimen in the United States in 2006. Its G + C content is 63.4 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31108 T are publicly available through the accession numbers LR536848, LR536863-LR536865 and CABPSB010000000, respectively.
Description of Pandoraea aquatica sp. nov.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States and from pond water in Belgium.
The type strain is LMG 31011 T (=CCUG 73384 T ) and was isolated from pond water in a greenhouse in Belgium in 2013. Its G + C content is 62.9 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and wholegenome sequence of LMG 31011 T are publicly available through the accession numbers LR536849, LR536869-LR536871, and CABPSN010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States. The type strain is LMG 20603 T (= ATCC BAA-110 T = CDC H652 T ) and was isolated from cystic fibrosis sputum in the United States in 1998. Its G + C content is 63.0 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 20603 T are publicly available through the accession numbers LR536994, LR536872-LR536874, and CABPST010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States and from rhizospheric soil in India.
The type strain is LMG 20602 T (=ATCC BAA-109 T = CDC G9805 T ) and was isolated from sputum of a non-cystic fibrosis patient in the United States in 1996. Its G + C content is 63.4 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 20602 T are publicly available through the accession numbers LR536850, LR536884-LR536886, and CABPRV010000000, respectively. The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States. The type strain is LMG 31118 T (=CCUG 73387 T ) and was isolated from a cystic fibrosis specimen in the United States in 2008. Its G + C content is 63.3 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31118 T are publicly available through the accession numbers LR536851, LR536893-LR536895, and CABPSQ010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from soil and water samples in Belgium and the Netherlands, from human clinical samples in the United States, and from historical bamboo slips in China.
The type strain is LMG 31106 T (=CCUG 39680 T ) and was isolated from garden soil in The Netherlands. Its G + C content is 63.7 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31106 T are publicly available through the accession numbers LR536852, LR536896-LR536898, and CABPSL010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in Belgium and the United States, from soil samples in Belgium, and from plant roots in India.
The type strain is LMG 31010 T (=CCUG 73378 T ) and was isolated from sputum of a cystic fibrosis patient in Belgium in 2002. Its G + C content is 62.6 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31010 T are publicly available through the accession numbers LR536853, LR536902-LR536904, and CABPSA010000000, respectively.
Description of Pandoraea communis sp. nov.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical, soil and water samples in Belgium, and from soil in Australia.
The type strain is LMG 31110 T (=CCUG 73383 T ) and was isolated from sputum of a cystic fibrosis patient in Belgium in 2012. Its G + C content is 62.6 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31110 T are publicly available through the accession numbers LR536854, LR536911-LR536913 and CABPSJ010000000, respectively.
The phenotypic description is as presented above and in Table 3.
The type (and thus far only) strain is LMG 31012 T (=CCUG 73380 T ) and was isolated from soil of a house plant in Belgium in 2003. Its G + C content is 63.7 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31012 T are publicly available through the accession numbers LR536855, LR536923-LR536925, and CABPSH010000000, respectively.
The phenotypic description is as presented above and in Table 3.
The type (and thus far only) strain is LMG 31112 T (=CCUG 73379 T ) and was isolated from garden soil in Belgium in 2003. Its G + C content is 62.3 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31112 T are publicly available through the accession numbers LR536857, LR536926-LR536928 and CABPSM010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from soil samples in Belgium and human clinical samples in the United States.
The type strain is LMG 31009 T (=CCUG 73377 T ) and was isolated from maize rhizosphere soil in Belgium in 2002. Its G + C content is 63.1 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and wholegenome sequence of LMG 31009 T are publicly available through the accession numbers LR536856, LR536929-LR536931, and CABPSF010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in the United States. The type strain is LMG 31116 T (=CCUG 73389 T ) and was isolated from a cystic fibrosis specimen in the United States in 2006. Its G + C content is 64.7 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31116 T are publicly available through the accession numbers LR536858, LR536935-LR536937, and CABPSD010000000, respectively.
The phenotypic description is as presented above and in Table 3.
Isolated from human clinical samples in Australia, Belgium, Germany, United Kingdom and the United States.
The type strain is LMG 31109 T (=CCUG 73390 T ) and was isolated from a cystic fibrosis specimen in the United States in 2008. Its G + C content is 66.1 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31109 T are publicly available through the accession numbers LR536859, LR536941-LR536943 and CABPSC010000000, respectively.
The phenotypic description is as presented above and in Table 3.
The type (and thus far only) strain is LMG 31114 T (=CCUG 73388 T ) and was isolated from a cystic fibrosis specimen in the United States in 2009. Its G + C content is 62.5 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31114 T are publicly available through the accession numbers LR536861, LR536974-LR536976, and CABPSK010000000, respectively.
Description of Pandoraea soli sp. nov.
Pandoraea soli sp. nov. (so'li. L. gen. n. soli of soil, the source of the type strain).
The phenotypic description is as presented above and in Table 3.
The type (and thus far only) strain is LMG 31014 T (=CCUG 73382 T ) and was isolated from soil of a house plant in Belgium in 2003. Its G + C content is 63.6 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31014 T are publicly available through the accession numbers LR536860, LR536980-LR536982, and CABPSG010000000, respectively.
The phenotypic description is as presented above and in Table 3.
The type (and thus far only) strain is LMG 31013 T (=CCUG 73381 T ) and was isolated from soil of a house plant in Belgium in 2003. Its G + C content is 63.5 mol% (calculated based on its genome sequence). The 16S rRNA, gltB, gyrB, recA and whole-genome sequence of LMG 31013 T are publicly available through the accession numbers LR536862, LR536977-LR536979, and CABPRU010000000, respectively.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the European Nucleotide Archive PRJEB30806, PRJEB30685, FUNDING Part of this work was performed in the framework of the Belgian National Reference Centre for Burkholderia, supported by the Ministry of Social Affairs through a fund within the National Health Insurance System. This funding agency had no role in study design, data collection and interpretation, or the decision to submit the work for publication. JL and TS receive support from the Cystic Fibrosis Foundation (United States).