Nanopore Long-Read Guided Complete Genome Assembly of Hydrogenophaga intermedia, and Genomic Insights into 4-Aminobenzenesulfonate, p-Aminobenzoic Acid and Hydrogen Metabolism in the Genus Hydrogenophaga

We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia, the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p-aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as “hydrogen eating bacteria.”


INTRODUCTION
The genus Hydrogenophaga consists of rod-shaped yellow-pigmented gram negative bacteria that are generally considered capable of using hydrogen as an energy source (Willems et al., 1989). Among all currently described species of Hydrogenophaga, Hydrogenophaga intermedia is one of the most studied species due to its distinctive ability to efficiently degrade 4-aminobenzenesulfonate (4-aminobenzenesulfonate), a recalcitrant intermediate compound in the synthesis of colorants (Dangmann et al., 1996). The catabolism of 4-aminobenzenesulfonate in H. intermedia S1 is by far the most well-studied among all described 4-aminobenzenesulfonate degraders (Perei et al., 2001;Wang et al., 2009;Gan et al., 2011b;Yang et al., 2012;Hayase et al., 2016). H. intermedia S1 was isolated from wastewater and could grow on 4-aminobenzenesulfonate as the sole carbon source in a two-species bacterial coculture with Agrobacterium radiobacter S2 (Dangmann et al., 1996). To date, several enzymes associated with the downstream degradation of 4-aminobenzenesulfonate in strain S1 have been identified, cloned, characterized and functionally validated (Contzen et al., 2001;Halak et al., 2006Halak et al., , 2007. However, the first deamination step converting 4-aminobenzenesulfonate to 4-sulfocatechol, which was presumed to be catalyzed by a dioxygenase is yet to be indentified in strain S1 (Contzen et al., 2001;Halak et al., 2006Halak et al., , 2007. More than 20 years after the isolation of the S1/S2 mixed culture, a second 4-aminobenzenesulfonate-degrading two-species bacterial culture consisting of Hydrogenophaga sp. PBC and Ralstonia sp. PBA isolated from textile wastewater was reported (Gan et al., 2011b). Transposon mutagenesis of strain PBC led to the identification of the sad operon that is responsible for the initial deamination step of 4-aminobenzenesulfonate to 4-sulfocatechol. The deamination step was presumed to operate similarly H. intermedia strain S1 given its near identical 16S rRNA sequence to strain PBC (Gan et al., 2011a(Gan et al., ,b, 2012c. In addition, both Hydrogenophaga strains could only be grown in axenic culture with 4-aminobenzenesulfonate as the sole carbon and nitrogen source if they were supplemented with p-aminobenzoate and biotin (Kampfer et al., 2005;Gan et al., 2011a,b), providing insight into the syntrophic relationship between the Hydrogenophaga strains and their helper strains in addition to suggesting that the biosynthesis pathways for these compounds were either missing or non-functional in H. sp. PBC and H. intermedia S1 (Dangmann et al., 1996;Gan et al., 2011b).
Strain PBC was the first Hydrogenophaga strain to have its genome sequenced (Gan et al., 2012b). However, due to the short read length of Illumina sequencing (2 × 100 bp) at that time and the high genomic GC content of strain PBC, the assembled genome was relatively fragmented. Further, most of the genes coding for 4-aminobenzenesulfonate metabolism were located on different contigs, limiting the analysis of the genomic structure and regulation of this pathway (Hegedüs et al., 2017). Interestingly, genes coding for biosynthesis of biotin and 4-aminobenzoic acid could not be identified in the draft genome of strain PBC, which is consistent with its requirement for these supplementations for growth in axenic culture (Gan et al., 2011b;Kim and Gan, 2017). However, given that gaps exist in the draft genome, it may be possible that these gene regions were not assembled, thus emphasizing the need for a complete genome assembly to verify this observation. As more Hydrogenophaga genomes became available, H. sp. PBC was recently reclassified as H. intermedia PBC based on in silico genome-genome hybridization (Kim and Gan, 2017). Interestingly, despite the availability of various whole genome sequences for the type species of Hydrogenophaga, comparative genomic analyses of Hydrogenophaga currently are limited and most analyses have focused on a single genome (Gan et al., 2011b;Kim and Gan, 2017).
One of the defining traits of the genus Hydrogenophaga is the ability to grow chemorganotrophically or chemolithoautotrophically by oxidizing hydrogen as an energy source (Willems et al., 1989). A [NiFe]-membrane-bound hydrogenase has been isolated and purified from H. sp. AH-42 providing the first biochemical characterization of hydrogenase and molecular insight into the genetic components responsible for hydrogen oxidation in the genus Hydrogenophaga (Yoon et al., 2008(Yoon et al., , 2009). However, exceptions to the defining feature of the genus, e.g., inability to oxidize hydrogen under standardized culture condition, have been reported in two Hydrogenophaga type strains namely H. intermedia S1 T and H. atypica DSM 15342 T (Contzen et al., 2000;Kampfer et al., 2005), naturally inviting a comprehensive genomic survey of hydrogen-oxidizing genes in all sequenced members of the genus Hydrogenophaga.
To improve the genome assembly of H. intermedia PBC, we used the Nanopore MinION portable long-read sequencer to generate long sequencing reads coupled with a hybrid assembly using the Illumina dataset generated previously by Gan et al. (2012b). We showed that the incorporation of the Nanopore long reads enabled the successful assembly of H. intermedia PBC genome into a single contig. Further, we also provided an updated whole genome-based phylogeny of the family Comamonadaceae in addition to performing comparative genomics analysis of the genus Hydrogenophaga for the first time focusing on the prevalence of genomic potential for hydrogen oxidation, pABA synthesis and 4-aminobenzenesulfonate biodegradation.

Illumina and Nanopore Whole Genome Sequencing
Illumina-based whole genome sequencing and assembly of strain PBC has been previously described (Gan et al., 2012b). To generate sequencing data using the MinION device, the gDNA of strain PBC was extracted from a 5-day old culture on nutrient agar using a slightly modified SDS-lysis method (Sokolov, 2000). Five micrograms of gDNA was subsequently used for library construction using the Nanopore sequencing kit SQK-NSK007 (Oxford Nanopore Technologies, Oxford, United Kingdom) according to the manufacturer's instructions. The library was subsequently loaded onto an R9 MINION flowcell (Oxford Nanopore Technologies, Oxford, United Kingdom) and run for 48 h. 2D-basecalling was performed on the cloud-based Metrichor. Fast5 to fasta conversion used Nanocall (David et al., 2016).

Hybrid Genome Assembly and Genome Annotation
SPAdes version 3.8.1 (Antipov et al., 2016) was used for an initial hybrid genome assembly incorporating both Illumina and Nanopore reads. Contigs longer than 1,000 bp were then selected for in silico scaffolding and gap-closing using npScarf (Cao et al., 2016). The assembled genome was annotated using the NCBI Prokaryotic annotation pipeline (Tatusova et al., 2016) and manually curated to include annotations for the pcaH2, pcaG2, pcaB2 and 4SLH genes coding for protocatechuate 3,4-dioxygenase alpha subunit type II, protocatechuate 3,4-dioxygenase beta subunit type II, 3-carboxymuconate cycloisomerase type II and 4-sulfomuconolactoe hydrolase, respectively, that were missed by the default annotation setting. The identification and annotation of prophage sequences were performed using the PHASTER web-server (Arndt et al., 2016). Visualization of the complete genome of strain PBC and genomic comparisons with public available Hydrogenophaga genomes ( Table 1) were performed using Blast Ring Image Generator (blastN e-value setting of 1e-10) (Alikhan et al., 2011). Mapping of Nanopore reads and Illumina contigs was performed using Minimap2 1 and subsequently visualized in Integrative Genomics Viewer version 3 (Thorvaldsdóttir et al., 2013). Generation of Illumina contigs was done by assembling Illumina-only reads using SPAdes version 3.8.1 and filtering for contigs with coverage and length of more than 5× and 300 bp, respectively (Antipov et al., 2016).

In Silico Genome-Genome Hybridization
Pair-wise average nucleotide identity of strain PBC against the publicly available genome sequences of Hydrogenophaga species were performed using JSpecies V1.2 (ANIm setting) (Richter et al., 2016). Strains exhibiting pair-wise ANI of more than 95% were considered as members of the same genospecies. A heatmap was plotted in R version 3 using the pheatmap library package. 1 https://github.com/lh3/minimap2

Whole Genome Phylogeny
Whole proteome was predicted using Prodigal (-p meta setting) and piped into PhyloPhlan for the identification of conserved proteins (Hyatt et al., 2010;Segata et al., 2013). The concatenated alignment generated by PhyloPhlAN was subsequently trimmed using trimAl version 1.9 (-automated1 setting) and used to construct a maximum likelihood tree using IqTree version 5.15 with 1000 ultrafast bootstraps (Capella-Gutiérrez et al., 2009;Nguyen et al., 2015). Visualization and annotation of trees was performed with MEGA6 (Tamura et al., 2013).

MinION Output and Hybrid Genome Assembly
A total of 90,491 1D and 51,630 2D reads were generated from a full 48-h MinION run. After filtering for reads longer than 2,500 bp, 14,404 1D and 12, 959 2D reads were retained for genome assembly. The final data output used for assembly was 173 megabases representing approximately 33× genome coverage of strain PBC. The average read length was 6,350 bp  ( Figure 1A: Regions 3 and 4) uniquely present in strain PBC, that were subsequently identified as prophage in origin by Phaster. Both phage regions were incomplete and only code for two to three major phage components. Phage components 1 and 2 on the contrary are complete, uniquely shared by both H. intermedia strains ( Figure 1A) and have the highest number of homologous proteins to Enterobacter phage Arya (NCBI Reference Sequence: NC_0310480) isolated from a termite gut.

New Genomic Insight into 4-Aminobenzenesulfonate Biodegradation in Hydrogenophaga
Genes responsible for the catabolism of 4aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Interestingly, the sad operon appears to be fairly conserved in all currently reported Hydrogenophaga genomes, suggesting that the ability to deaminate 4-aminobenzenesulfoate may be a defining metabolic trait of Hydrogenophaga strains. However, given the diverse isolation sources of Hydrogenophaga strains ( Table 1) and their presumably limited exposure to xenobiotic compounds, it is likely that the original substrate of the 4-aminobenzenesulfonate dioxygenase is a naturally occurring compound with close structural homology to 4-aminobenzenesulfonate.
In contradistinction, the genomic region containing pcaH2G2, pcaB2 and 4-SLH involved in the conversion of 4-sulfocatechol to maleylacetate is conserved only in the Hydrogenophaga intermedia strains (Figure 1B). In addition, it is also worth noting that this region was identified as a phage region and flanked by genes coding for IS6 transposable elements (Figure 1C), an indication of the mobility of the pcaH2G2, pcaB2 and 4-SLH gene cluster presumably through phage transduction (Hickman et al., 2010). As expected, the presence of multiple transposase genes in this genomic region led to a fragmented initial Illumina read assembly ( Figure 1C) due to the inability of short reads to resolve the repetitive regions (Phillippy, 2017). This shortcoming was overcome by integrating Nanopore long reads into the assembly as evidenced by their ability to span repeats in the genome ( Figure 1C).

H. pseudoflava and H. flava Belong to Two Genospecies Despite Their High Similarity in Genotypic and Protein Electrophoretic Profiles
Hydrogenophaga flava was initially designated as the type species of the genus Hydrogenophaga but due to its slow and unreliable growth, H. pseudoflava has been proposed for use as an alternative reference taxon for the genus as it exhibits similar genotypic and protein electrophoretic profiles to that of H. flava (Willems et al., 1989). In silico genome-genome hybridization indicates that although H. flava and H. pseudoflava are closely related to one another, they are clearly two distinct genospecies with a pairwise ANIm of 93% (Figure 2). A more in-depth analysis of the H. flava genome may be useful to identify its additional nutritional requirements for optimal and consistent growth on culture medium. Such a genomic approach has been successfully demonstrated in Clostridium tyrobutyricum and other Clostridia associated with butyric acid fermentation leading to the identification of key vitamins and several amino acids essential for growth (Storari et al., 2016). Furthermore, in silico genome-genome hybridization analysis also indicates that H. sp. H7, which was previously isolated from a coal mine in China (Table 1), could be reclassified to the species H. pseudoflava given its high pairwise ANIm (97%) to H. pseudoflava NBRC 102511 T (Figure 2).

Genome-Based Phylogeny of the Family Comamonadaceae Supports
Hydrogenophaga as a Monophyletic Group but Indicates Taxonomic Incongruence in the Genera Acidovorax and Comamonas Maximum likelihood tree based on 400 universal conserved proteins provides maximal support for the monophyletic clustering of all currently sequenced Hydrogenophaga strains with Frontiers in Microbiology | www.frontiersin.org H. intermedia strains and H. sp. PML 113 being basal to the rest of Hydrogenophaga (Figure 3). The genus Hydrogenophaga clade shares a sister relationship with a group of highly alkaliphilic hydrogen-utilizing Comamonadacea bacteria isolated from an alkaline serpentinizing springs at The Cedars, California (Suzuki et al., 2014). Beyond the genus Hydrogenophaga, however, considerable taxonomic incongruence was observed in the genera Acidovorax and Comamonas. Some members of the Acidovorax, such as Acidovorax sp. strains 121606, 202149 and 12322-1, could be reclassified as members of the genus Comamonas based on their phylogenetic affinities. Within the genus Comamonas, Comamonas badia DSM 17552 T (Suzuki et al., 2014) is peculiar as it does not cluster with a majority of other Comamonas, but instead forms a sister group with members of the genus Alicycliphilus, albeit with moderate bootstrap support. In addition, Comamonas granuli NBRC 101663 T appears to be considerably divergent from other Comamonas species given its basal position in the clade containing all currently sequenced Comamonas strains (Suzuki et al., 2014). A more in-depth taxonomic investigation based on the percentage of conserved proteins (POCP) to specifically re-define the genus boundary between Comamonas and Acidovorax should be undertaken (Qin et al., 2014).

Genomic Potential for Hydrogen Oxidation Is Restricted to a Few Members of Hydrogenophaga
The genus Hydrogenophaga was originally proposed by Willems et al. (1989) to group "yellow-pigmented hydrogen-oxidizing [Pseudomonas] species belonging to the acidovorans rRNA complex." However, by identifying currently known HMM profiles related to hydrogenase system, genomic potential for hydrogen oxidation was only identified in 7 Hydrogenophaga strains of which 6 strains are closely related as shown by their monophyletic clustering in the phylogenomic tree (Figure 3). The 7 Hydrogenophaga strains include the original strains that were initially used to describe the genus Hydrogenophaga, such as H. palleronii, H. taeniospralis, H. flava and H. pseudoflava, corroborating their previously demonstrated in vitro hydrogen oxidizing ability (Willems et al., 1989;Contzen et al., 2000). The reported inability of H. intermedia S1 T and H. atypica DSM 15342 T to oxidize hydrogen in vitro (Contzen et al., 2000;Kampfer et al., 2005) correlates with the absence of key genes associated with hydrogen metabolism (Figure 3). However, the absence of genomic potential for hydrogen catabolism in nearly half of the currently sequenced Hydrogenophaga strains is unexpected and indicates that the description genus of Hydrogenophaga as "hydrogen eating bacteria" is not sustainable and warrants revision.

p-Aminobenzoic Acid Auxotrophy within Hydrogenophaga
Mapping of Hydrogenophaga genome sequences to the contig of H. palleronii NBRC102513 (BCTJ01000048, 24 kbp) containing pabB revealed that only 8 out of 14 genomes exhibit nucleotide similarity (>40% identity and E-value < 1E −5 ) to the pabB gene region (Figure 4A). Of the eight genomes, only partial pabB hits were observed for strains H7, NBRC102514, Root209, NBRC102511 and A37, hinting divergence at the nucleotide level. Interestingly, only two proteins belonging to H. palleroni and H. sp. PML113 achieved an HMMscore that is above the domain noise cutoff score for bona fide PabB (TIGR00553) (Supplementary Figure S1). On the contrary, strain IBVHS2 despite exhibiting nucleotide similarity to the pabB gene region, does not code for protein with HMMscore above the TIGR00553 noise cutoff. However, it is worth noting that multiple proteins with HMMscore slightly below the TIGR00553 noise cutoff could be identified from the annotated Hydrogenophaga genomes (Supplementary Figure S1). These proteins appear to be more closely related to PabB than TrpE as evidenced by their extremely low HMMscore to TrpE (TIGR00564) in comparison with other putative TrpE proteins and their phylogenetic affiliation to the PabB clade (Supplementary Figure S2 and Data Sheet 1). TrpE is involved in the synthesis of anthranilate from chorismate and ammonia. Functional characterization of these divergent PabB homologs will be useful to explore their role in pABA synthesis as of them belong to Hydrogenophaga pseudoflava that has been shown to grow on minimal medium without pABA supplementation (Povolo et al., 2013).
The absence pabB in H. intermedia strains PBC and S1 T is consistent with their previously reported reliance on either external nutrient supplementation or helper strain for pABA (Dangmann et al., 1996;Gan et al., 2011b;Kim and Gan, 2017). A recent genomic analysis of Hydrogenophaga intermedia PBC helper strain, Ralstonia sp. PBA, showed the presence of two PabB proteins of which one of them was fused to PabC (Gan et al., 2012a;Kim and Gan, 2017), potentially enabling the overproduction of pABA to sustain the growth of both strains when they were co-cultured in vitamin-free minimal medium containing 4-aminobenzenesulfonate as the sole carbon and nitrogen source. A closer comparison of the genomic region containing genes spatially associated with pabB in H. intermedia PBC and H. palleroni NBRC102513 suggests that pabB may be lost through gene deletion ( Figure 4B). Surprisingly, strains LPB0072, RAC07, Root209 and IBVHS1 appear to also lack the genomic potential for pABA synthesis. A future study investigating the ability of these strains to grow on defined medium without pABA supplementation will be necessary to verify their predicted pABA auxotrophy as most Hydrogenophaga strains were isolated on nutrient rich medium, which will contain traced amount of pABA. For example, strains Root209 and RAC07 were maintained on tryptic soy agar and minimal medium with yeast extract supplementation, respectively (Bai et al., 2015;Fixen et al., 2016).

DATA ACCESS
The complete genome of Hydrogenophaga intermedia PBC has been deposited under the accession number CP017311. Raw data for both Illumina and Nanopore data have been placed under the SRA under the SRA project ID SRP092076.

AUTHOR CONTRIBUTIONS
HG performed gDNA extraction, and bioinformatics analysis and drafted the manuscript. YL performed library preparation and MINION sequencing. CA supervised the study.

ACKNOWLEDGMENTS
Funding for this study was provided by the Monash University Malaysia Infrastructure grant awarded to CA. HG and YL were grateful to the Monash University Malaysia Tropical Medicine and Biology Multidisciplinary Platform for financial support.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01880/full#supplementary-material FIGURE S1 | Sequence scores of Hydrogenophaga proteins with significant hits (E-value < 10 −5 ) to TIGR00564 or TIGR00553. Dotted lines represent the TIGRFAM model noise cutoff bit score thresholds. Red branches indicate proteins with high sequence scores to TIGR00564 corresponding to bona fide TrpE. Blue and green branches indicate proteins with low scores to TIGR00564 but high scores to TIGR00553 (bona fide PabB). In addition, proteins associated with blue branches exhibit scores that are either higher than or very close to the noise cutoff score of TIGR00553.
DATA SHEET 1 | Hydrogenophaga protein sequences used for the construction of maximum likelihood tree.