Phylogenomic Classification and Biosynthetic Potential of the Fossil Fuel-Biodesulfurizing Rhodococcus Strain IGTS8

Rhodococcus strain IGTS8 is the most extensively studied model bacterium for biodesulfurization of fossil fuels via the non–destructive sulfur–specific 4S pathway. This strain was initially assigned to Rhodococcus rhodochrous and later to Rhodococcus erythropolis thus making its taxonomic status debatable and reflecting the limited resolution of methods available at the time. In this study, phylogenomic analyses of the whole genome sequences of strain IGTS8 and closely related rhodococci showed that R. erythropolis and Rhodococcus qingshengii are very closely related species, that Rhodococcus strain IGTS8 is a R. qingshengii strain and that several strains identified as R. erythropolis should be re-classified as R. qingshengii. The genomes of strains assigned to these species contain potentially novel biosynthetic gene clusters showing that members of these taxa should be given greater importance in the search for new antimicrobials and other industrially important biomolecules. The plasmid-borne dsz operon encoding fossil fuel desulfurization enzymes was present in R. qingshengii IGTS8 and R. erythropolis XP suggesting that it might be transferable between members of these species.


INTRODUCTION
The implementation of green and economic technologies in the petroleum industry is of increasing interest. This is particularly important in view of the continuously increasing global demand for cleaner fuels, stricter environmental regulations and dwindling oil prices (Sahu et al., 2015;Head and Gray, 2016;Ismail et al., 2017). The biotechnological desulfurization of crude oil and transportation fuels, such as diesel and gasoline, holds the potential to overcome or mitigate the technical, economic and environmental hurdles of conventional thermochemical desulfurization technologies (Kilbane, 2006;Mohebali and Ball, 2016). Rhodococcus strain IGTS8 (ATCC 53968) was the first bacterium to be isolated based on its unique ability to break C-S bonds of fuel-born organosulfur compounds and utilize them as a sole sulfur source (Denome et al., 1994). This discovery raised the prospect of using biocatalysts for removal of sulfur from fossil fuels via biodesulfurization (Kilbane and Bielaga, 1990;Kilbane II and Jackowski, 1992).
The biodesulfurization reactions are catalyzed by two monooxygenases (DszC and DszA) and a desulfinase (DszB) which constitute the non-destructive 4S pathway. This pathway was first reported and characterized in the IGTS8 strain which for many years has served as the most commonly adopted model for biodesulfurization studies. The genes of the 4S pathway, dszA, dszB, and dszC, are carried as an operon on a linear megaplasmid in the IGTS8 strain (Kilbane and Stark, 2016). These genes have been found in R. erythropolis strains A66 and A69 which use various organosulfur and organonitrogen compounds as sole sources of sulfur and nitrogen, respectively (Santos et al., 2007). DszC shows sulfur acquisition oxidoreductase (SfnB family) and acyl-CoA dehydrogenase activities and catalyzes the conversion of dibenzothiophene to dibenzothiophene sulfone (Gray et al., 1996). DszA, a FMN-dependent oxidoreductase of the nitrilotriacetate monooxygenase family, converts dibenzothiophene sulfone to 2-(2 -hydroxyphenyl) benzenesulfinate which is catalytically converted to 2-hydroxybiphenyl and sulfite by the desulfinase DszB.
The importance of resolving the taxonomic status of rhodococci which show remarkable catabolic activity is often overlooked despite significant improvements in the classification of the genus Rhodococcus. In the present study, the whole genome sequence of Rhodococcus strain IGTS8 was compared with those of R. erythropolis and R. qingshengii strains to establish its taxonomic status and to compare the biosynthetic and biodesulfurization capabilities of strains assigned to these species.

Bacterial Strain
Rhodococcus strain IGTS8 (American Type Culture Collection, ATCC 53968) was isolated by Kilbane and colleagues at the Gas Technology Institute-Chicago based on its sulfur-specific fuel biodesulfurization capabilities (Kilbane and Bielaga, 1990;Kilbane II and Jackowski, 1992).

Genomic DNA Extraction and Sequencing
Rhodococcus strain IGTS8 was grown overnight in nutrient broth at 28 • C and the cells pelleted by centrifugation. Genomic DNA was extracted with the Wizard Genomic DNA purification kit (Promega) using the manufacturer's instructions for Gram positive bacteria and a library constructed with the Nextera XT DNA Library Prep kit (Ilumina) using 1 ng of genomic DNA. The library was evaluated with a 2100 Bioanalyzer (Agilent Technologies) before sequencing on an Illumina Miseq system (2 × 150 paired end reads).

Genome Assembly and Computational Analyses
The PCR duplicates in the libraries were filtered with FastUniq 1.1 (Xu et al., 2012) and low-quality (<30) and adapter sequences trimmed with cutadapt 1.14 (Martin, 2011). Contaminant screening was performed with FastQ Screen 0.11.1 (Wingett and Andrews, 2018) to filter reads mapping on the PhiX sequence, vector sequences in the NCBI UniVec database 1 and other species including Homo sapiens (hg19), Arabidopsis thaliana (TAIR10), Escherichia coli (NC_000913.3), Stenotrophomonas maltophilia (NC_010943.1) and Pseudomonas aeruginosa (NC_002516.2). The remaining singleton reads in the data were excluded with the tool makepairs of the Pairfq algorithm 2 and the quality of the data checked with FastQC 3 . The genome sequence was assembled with SPAdes 3.10.1 (Bankevich et al., 2012) using kmer values of 25, 35, 45, 55, 65, 85, 105 and 125 in the gage mode. The quality of the assembly was assessed with QUAST 4.5 (Gurevich et al., 2013). A 16S rRNA gene sequence extracted from strain IGTS8 using BARRNAP 0.8 (Seemann, 2013) was BLAST-searched into the GenBank (Camacho et al., 2009). An extended scaffold was created from the reference genome of Rhodococcus qingshengii strain djl6-2 using the software Ragout 2.0 (Kolmogorov et al., 2014) and the resulting assembly annotated using the RAST-SEED pipeline (Aziz et al., 2008).
The genome sequences of other R. qingshengii strains and those of the closely related Rhodococcus erythropolis were obtained from the GenBank ( Table 1). These genomes were also annotated using the RAST pipeline in order to have an equivalence of annotation for the comparative analyses. The genome sequences were compared using Roary (Page et al., 2015). A phylogenetic tree was calculated from concatenated nucleotide sequences of the core genome (1985 genes) after removing the sites with gaps using IQ-Tree (Nguyen et al., 2015). Pairwise digital DNA-DNA hybridization (dDDH) values were calculated using GGDC 2.1 (Auch et al., 2010) and average nucleotide identity (ANI) values determined using FastANI (Jain et al., 2018) between the strains within the dataset. Average amino acid identities (AAI) from reciprocal best hits were calculated between R. erythropolis NBRC 15567 T , R. qingshengii JCM 15477 T and the IGTS8 strain using the web server (Rodriguez-Rivera and Konstantinidis, 2014) 4 .

Biosynthetic and Biodesulfurization Potential
Whole genome sequences of all of the strains were analyzed using antiSMASH v5.1.1 to identify biosynthetic 4 http://enve-omics.ce.gatech.edu/aai/ gene clusters (BGCs) using "Strict" detection criteria and additional features (KnownClusterBlast, ClusterBlast, SubClusterBlast, ActiveSiteFinder and Cluster Pfam analysis; Blin et al., 2019). The BGCs from the strains were sorted into groups based on their predicted activity. The protein sequences of the biodesulfurization enzymes DszA, DszB and DszC were obtained from the GenBank (accession number: L37363) and sought within the dataset using BLASTP (Camacho et al., 2009), as mentioned above.

Genomic Features
The sequencing reads for strain IGTS8 were assembled into 25 contigs ( Table 2). The largest scaffold was 6.4 Mb in size (digital GC content 61.7 mol%) and was annotated with 6,132 coding DNA sequences, 2 rRNA and 54 tRNA genes. Other smaller contigs were potential plasmid sequences that varied from 1.2 to 88.4 kb in size and contained 0-94 genes ( Table 2). The genome assembly of the IGTS8 strain

Phylogenomic Characterization
A BLAST search of an extracted 16S rRNA gene sequence from the genome of Rhodococcus strain IGTS8 showed 99.92% identity to R. erythropolis strain X5 and R. qingshengii strain RL1. Consequently, we downloaded the genome sequences of 29 R. erythropolis and R. qingshengii strains available from GenBank and calculated dDDH values against strain IGTS8 ( Table 3). The dDDH value between the latter and R. erythropolis NBRC 15567 T was 62.1% which is below the accepted species cut-off of 70% (Auch et al., 2010). In contrast, the dDDH value between strain IGTS8 and R. qingshengii JCM 15477 T was 98.2%, hence well above this threshold. The corresponding ANI values between strain IGTS8 and the type strains of R. erythropolis and R. qingshengii were 95.48 and 99.68, respectively, that is slightly and well above the 95% recognized threshold for the delineation of species (Konstantinidis and Tiedje, 2005). Similarly, the AAI value between strain IGTS8 and R. qingshengii JCM 15477 T (99.31%; SD: 5.07%) was much higher than that calculated against R. erythropolis NBRC 15567 T (96.67%; SD: 7.09%).
In addition, partial sequences of the catA and gyrB genes of R. qingshengii strain djl-6 T (= DSM 45222 T = JCM 15477 T ) were obtained from the GenBank (accession numbers KF500432.1 and KF374699.1, respectively). BLAST search (Altschul et al., 1997;Camacho et al., 2009) for these genes not only gave single hits in the genome of strain IGTS8 but also showed 100% identity in each case. These results indicate that strain IGTS8 is more closely related to the R. qingshengii strain than to the type strain of R. erythropolis. The size and digital GC content of the genome of strain IGTS8 were similar to those of the strains received as R. erythropolis and R. qingshengii. These values ranged from 6.28 to 7.43 Mb and from 61.7 to 62.5 mol%, respectively (Table 1). A phylogenetic tree generated from concatenated core genome sequences of these strains showed that they fell into two distinct clades, one encompassing 13 R. erythropolis strains including R. erythropolis NBRC 15567 T and the other all of the R. qingshengii strains together with 7 strains received as R. erythropolis (Figure 1). Pairwise dDDH calculations indicated the presence of three species in the dataset (Supplementary Table S1). These results are consistent with previous studies in which R. erythropolis and R. qingshengii strains fell into a single taxon, namely species-group D (Sangal et al., 2016(Sangal et al., , 2019 Table S2) and 98.16-98.79% (average: 98.56 ± 0.13), respectively which is well above the cut-off point for assigning them to these species. Pairwise ANI values between the strains from the two clades were marginally higher than the 95-96% threshold suggesting that they might belong to a single species (ANI values: 95.11-96.43%; average: 95.53 ± 0.21). The corresponding AAI value between R. erythropolis NBRC 15567 T and R. qingshengii JCM 15477 T (AAI: 96.64%; SD: 7.14%) underpins this point.
R. qingshengii was proposed as a novel species based on phenotypic differences and an experimental DDH value of 23.8% between strain djl-6 T (JCM 15477 T ) and R. erythropolis DSM 43066 T (NBRC 15567 T ; Xu et al., 2007). Computational DDH values between the individuals of the two clades defined in the present study were 61.7-68.9% (average: 62.9 ± 1.4), values that are much higher than the corresponding experimental value between the R. qingshengii and R. erythropolis type strains but still below the 70% cut-off value (Supplementary Table S1). An ANI value of 96% was found to be suitable for Aeromonas species delineations that correlated with the dDDH cut-off of 70% (Colston et al., 2014). Similarly, in the present study ANI and dDDH values were found to be strongly correlated (R 2 = 0.9932; Figure 2). It has been pointed out that the use of a universal cut-off point is not appropriate as ANI values reflect the methods used to determine them and that the evolutionary history of the taxa under study should be considered (Palmer et al., 2020).
A comparison of all 30 genomes revealed a pangenome encompassing 32,515 genes, including 1,985 core genes present amongst all of the strains (Supplementary Table S3). Based on the default protein BLAST identity of 95%, only 34 and 35 genes were found to be unique to strains in the R. erythropolis and R. qingshengii clades, respectively. However, 53-60% of these genes encode hypothetical proteins (Supplementary Table S3). For the remaining clade-specific genes, homologs (other genes encoding proteins with the same functions) were observed among the other clades (Supplementary Table S3). The default 95% identity cut-off is highly stringent for strains of different species which has potentially separated some of the orthologous genes as clade-specific genes.
Many genes (218) were found to be specific to strain IGTS8 with 116 encoding hypothetical proteins or proteins of unknown function and 10 belonging to mobile genetic elements such as transposases (Supplementary Table S3). Again, homologs of the remaining 92 genes with annotated functions were present among other strains. However, five genes encoding 3-hydroxy-3-isohexenylglutaryl-CoA:acetate lyase (re30.peg.6706), acetophenone carboxylase subunit Apc3 (re30.peg.6438), cyanate ABC transporter substrate binding protein (re30.peg.6484), a possible dicarboxylate carrier protein (re30.peg.6646) and a small heat shock protein (re30.peg.4427) appear to be specific to IGTS8 and may have introduced minor functional variations in strain IGTS8 in comparison to other R. qingshengii strains. Molecular investigations are required to understand the functional impact of these genes and those encoding hypothetical proteins on the strain IGTS8.
The genomic data from this study not only underline the close relationship between R. erythropolis and R. qingshengii strains (Xu et al., 2007;Dastager et al., 2014;Táncsics et al., 2014;Sangal et al., 2016Sangal et al., , 2019 but also show that the case for assigning them to a single species is problematic. In such instances, recommended thresholds of genomic indices for species delineation not only need to be interpreted with care but should also take into account the broader taxonomic properties of the organisms (Colston et al., 2014;Riesco et al., 2018). Given the assignment of the IGTS8 strain to a distinct phylogenomic clade corresponding to R. qingshengii (Figure 1), supporting in silico dDDH similarity values ( Table 3) and genomic features (Supplementary Tables S1, S2) together with available phenotypic data (Gray et al., 1996;Xu et al., 2007;Jones and Goodfellow, 2012), it is prudent to continue to recognize the taxonomic status of R. erythropolis and R. qingshengii.

Biosynthetic Potential
The number of predicted biosynthetic gene clusters (BGCs) was similar amongst the R. erythropolis and R. qingshengii strains ranging between 13 and 24 (Supplementary Table S4). The draft assemblies with higher numbers of contigs are predicted to have more gene clusters than the finished genomes, e.g., R. erythropolis strain S-43 with 533 contigs has the highest number of gene clusters. The six complete genomes of strains assigned to the two species have 15-17 BGCs (Figure 1; Supplementary Table S4). These observations are consistent with those of Sanchez-Hidalgo et al. (2018) who identified more gene clusters in genomes with smaller contigs amongst Amycolatopsis strains. The BGCs were assigned to 12 types all of which were detected in the genomes of both the R. erythropolis and R. qingshengii strains. In contrast, Ceniceros et al. (2017) found that predicted BGCs in rhodococci, including R. erythropolis strains, were clade-specific and lacked any homology with gene clusters encoding known natural products.
AntiSMASH predicts BGCs and potential products based on the percentage of genes from the closest known BGC showing significant BLAST hits to the query genome (Blin et al., 2019). In this study, twelve gene clusters including four BGCs potentially encoding bacteriocin, ectoine, heterobactin A/heterobactin S2 (non-ribosomal peptide synthetase, NRPS) and erythrochelin (NRPS) were highly conserved among the R. erythropolis or R. qingshengii strains and were predicted among 21-30 genomes (Supplementary Table S4). The cluster potentially encoding bacteriocin was present in all 30 strains with 75-100% of genes similar to known bacteriocin-producing clusters (Supplementary Table S4). Bacteriocins are commonly produced by bacteria to inhibit other strains competing for resources, some are extensively used in the food industry (Riley and Wertz, 2002;Settanni and Corsetti, 2008;Yang et al., 2014). The BGC predicted to encode ectoine biosynthesis was also identified in all 30 strains. Ectoine, a protective molecule, enables bacteria to survive in extreme conditions (Graf et al., 2008), indicating that R. erythropolis and R. qingshengii strains may be able to withstand harsh environmental conditions such as osmotic stress. In addition, it is used commercially as an osmotic stabilizing agent in healthcare and cosmetics (Kunte et al., 2014;Czech et al., 2018).
The number of NRPS clusters ranged from 5 out of 15 in R. erythropolis strain PR4 to 16 out of 24 in R. erythropolis strain VSD3 (Supplementary Table S4). Five NRPS gene clusters were highly conserved in the dataset, including a cluster with 90-100% of the genes coding for heterobactin A/heterobactin S2 biosynthesis (Supplementary Table S4). Only R. erythropolis strains NBRC 15567 T and S-43 showed lower proportions of genes (36 and 63%, respectively) similar to those coding for heterobactin A/heterobactin S2 biosynthetic enzymes. Heterobactins are siderophores involved in iron uptake which is essential for bacterial growth. Heterobactin biosynthesis was previously detected in R. erythropolis PR4 and the IGTS8 strains (Carran et al., 2001;Bosello et al., 2013). A predicted erythrochelin BGC was found to be conserved among 27 genomes (Supplementary Table S4). However, the corresponding cluster in R. qingshengii strain JCM 15477 T was shown to encompass a relatively lower proportion (42%) of genes when compared to known erythrochelin BGCs. Erythrochelin is another class of siderophores involved in iron uptake (Robbel et al., 2010). A third NRPS cluster showed partial gene similarities (18-27% of the genes) to that encoding coelichelin biosynthesis, a siderophore reported in Streptomyces coelicolor (Challis and Ravel, 2000). Siderophores have a wide range of roles as biosensors and as bioremediation and chelation agents involved in weathering of soil minerals and in enhancing plant growth (Ahmed and Holmstrom, 2014).
One of the conserved NRPS biosynthetic clusters present in the genomes of 22 of the strains showed limited gene similarities (3-4% of the genes) to those encoding rifamorpholines A-E (Supplementary Table S4). Rifamorpholines A-E are macrolactams, a new subclass of rifamycin antibiotics produced by Amycolatopsis strain HCa4 which show antimicrobial activity against methicillin-resistant Staphylococcus aureus . A NRPS-terpene hybrid cluster was detected in the genomes of 27 strains with 6% of genes similar to a BGC responsible for biosynthesis of SF2575 (an anticancer tetracycline; Pickens et al., 2009).
Conserved BGCs include a butyrolactone and a lanthipeptide cluster (no known similarities to any existing BGC), a cluster for biosynthesis of a linear azol(in)e-containing peptide (LAP) type (11% of the genes similar to those encoding diisonitrile antibiotic SF2768), a terpene biosynthesis cluster (18-42% of the genes similar to ones encoding an isorenieratene pigment) and a type 1 polyketide synthase (T1PKS; 6-8% similar to a kirromycin-encoding cluster). Butyrolactones are involved in intracellular signaling and are intermediates in the biosynthesis of other molecules that have an important role in metabolism and stress response (Du et al., 2011). Lanthipeptides, also known as RiPPs (ribosomally synthesized and post-translationally modified peptides) are involved in multiple biological activities, including inhibition of pathogenic bacteria (Repka et al., 2017;Ongey et al., 2018). Diisonitrile SF2768 is responsible for chelation and transport of copper and has antifungal properties (Wang et al., 2017;Xu and Tan, 2019) while polyketides are known as antimicrobial, immune-suppressing and antifungal agents (Gomes et al., 2013).
In addition to the conserved BGCs, other clusters were present in some of the R. erythropolis and R. qingshengii strains (Supplementary Table S4). For instance, the genomes of R. qingshengii strains JCM 6824, JCM 15477 and IGTS8 contained a cluster showing 81% similarity to genes encoding arylpolyene (aurachin RE) and a cluster in the genome of R. erythropolis strain JCM 9804 had 60% of the genes similar to those encoding albachelin biosynthesis. Arylpolyenes are bacterial pigments structurally and functionally similar to carotenoids (Schoner et al., 2016) and are active as antimicrobial and antioxidant agents (Narsing Rao et al., 2017). Albachelin is a siderophore that was initially isolated from Amycolatopsis alba (Kodani et al., 2015). Other BGCs showed either no or limited similarities to known clusters and are likely to produce novel metabolites. Similarly, other NRPS BGCs showed partial similarities to different antibiotic-producing clusters (Supplementary Table S4). These clusters appear quite diverse compared with previously characterized clusters and may encode novel antibiotics and hence need molecular characterization.
Biodesulfurization Genes-the dsz Operon R. qingshengii strain IGTS8 is the prototype of fuelbiodesulfurizing bacteria harboring the 4S biodesulfurization pathway which allows the use of diesel-born organosulfur compounds, such as dibenzothiophene as a sole sulfur source, converting it to 2-hydroxybiphenyl and releasing the sulfur atom as sulfite (Oldfield et al., 1998). The 4S pathway enzymes are encoded by the plasmid-borne dsz operon (Denome et al., 1994;Santos et al., 2007;Khan et al., 2017). A search for this dsz operon (Piddington et al., 1995) amongst the R. qingshengii and R. erythropolis strains showed that it was only present in the genome of R. erythropolis XP. This soil isolate, which possesses a plasmid carrying the dsz operon, is able to desulfurize benzonaphthothiophene (Yu et al., 2006;Tao et al., 2011). Interestingly, R. qingshengii IGTS8 and R. erythropolis XP strains belong to different rhodococcal clades (Figure 1) suggesting that the plasmid carrying the dsz operon may be transferable between R. qingshengii and R. erythropolis strains via horizontal gene transfer. This proposition is in line with the fact that the dsz genes are usually carried on conjugative plasmids close to transposable elements, such as insertion sequences (Kayser et al., 2002;Kilbane and Le Borgne, 2004;Mohebali and Ball, 2016).

CONCLUSION
The phylogenomic analyses show that R. erythropolis and R. qingshengii strains are members of very closely related species that belong to the R. erythropolis species-group. They also show that the catabolically active organism Rhodococcus strain IGTS8 is a bona fide member of R. qingshengii. Strains assigned to each of these species were shown to have genomes that contained 12 highly conserved BGCs, most of which showed limited similarity with clusters coding for the biosynthesis of known specialized metabolites thereby providing further evidence that rhodococci should feature prominently in drug discovery programmes. The finding that the genome of R. erythropolis strain XP, like that of R. qingshengii strain IGTS8, contains the biodesulfurization dsz operon suggests that plasmids carrying this operon may be readily transferable between R. erythropolis and R. qingshengii strains.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
WI and HM conceived the study. VS designed the strategy for the data analyses. VC and SK extracted genomic DNA, performed sequencing and assembled the reads into scaffolds. DT and VS analyzed the data. MG helped with the taxogenomic analyses and classification of strain IGTS8. HM, DH, CC, and AV provided intellectual input on the biosynthetic and catabolic potential of the bacterial strains. VS, WI, and MG drafted the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This study was a part of a research project funded by the Arabian Gulf University (Grant No. LS_WE_2018).