Draft Genome Sequence of the Nitrogen-Fixing Rhizobium sullae Type Strain IS123T Focusing on the Key Genes for Symbiosis with its Host Hedysarum coronarium L.

The prominent feature of rhizobia is their molecular dialogue with plant hosts. Such interaction is enabled by the presence of a series of symbiotic genes encoding for the synthesis and export of signals triggering organogenetic and physiological responses in the plant. The genome of the Rhizobium sullae type strain IS123T nodulating the legume Hedysarum coronarium, was sequenced and resulted in 317 scaffolds for a total assembled size of 7,889,576 bp. Its features were compared with those of genomes from rhizobia representing an increasing gradient of taxonomical distance, from a conspecific isolate (Rhizobium sullae WSM1592), to two congeneric cases (Rhizobium leguminosarum bv. viciae and Rhizobium etli) and up to different genera within the legume-nodulating taxa. The host plant is of agricultural importance, but, unlike the majority of other domesticated plant species, it is able to survive quite well in the wild. Data showed that that the type strain of R. sullae, isolated from a wild host specimen, is endowed with a richer array of symbiotic genes in comparison to other strains, species or genera of rhizobia that were rescued from domesticated plant ecotypes. The analysis revealed that the bacterium by itself is incapable of surviving in the extreme conditions that its host plant can tolerate. When exposed to drought or alkaline condition, the bacterium depends on its host to survive. Data are consistent with the view of the plant phenotype as the primary factor enabling symbiotic nitrogen fixing bacteria to survive in otherwise limiting environments.

The prominent feature of rhizobia is their molecular dialogue with plant hosts. Such interaction is enabled by the presence of a series of symbiotic genes encoding for the synthesis and export of signals triggering organogenetic and physiological responses in the plant. The genome of the Rhizobium sullae type strain IS123 T nodulating the legume Hedysarum coronarium, was sequenced and resulted in 317 scaffolds for a total assembled size of 7,889,576 bp. Its features were compared with those of genomes from rhizobia representing an increasing gradient of taxonomical distance, from a conspecific isolate (Rhizobium sullae WSM1592), to two congeneric cases (Rhizobium leguminosarum bv. viciae and Rhizobium etli) and up to different genera within the legume-nodulating taxa. The host plant is of agricultural importance, but, unlike the majority of other domesticated plant species, it is able to survive quite well in the wild. Data showed that that the type strain of R. sullae, isolated from a wild host specimen, is endowed with a richer array of symbiotic genes in comparison to other strains, species or genera of rhizobia that were rescued from domesticated plant ecotypes. The analysis revealed that the bacterium by itself is incapable of surviving in the extreme conditions that its host plant can tolerate. When exposed to drought or alkaline condition, the bacterium depends on its host to survive. Data are consistent with the view of the plant phenotype as the primary factor enabling symbiotic nitrogen fixing bacteria to survive in otherwise limiting environments.

INTRODUCTION
The interaction between rhizobia and legumes in root nodules is an essential element in sustainable agriculture, as this symbiotic association is able to enhance biological fixation of atmospheric nitrogen (N 2 ), and is also a paradigm in plant-microbe signaling (Young et al., 2006;Giraud et al., 2007;Wang et al., 2012). Knowledge of the whole genome would allow the specific features of each rhizobium to be identified. The prominent feature of this group of bacteria is their molecular dialogue with plant hosts, an interaction that is enabled by the presence of a series of symbiotic genes encoding for the synthesis and export of signals triggering organogenetic and physiological responses in the plant (Spaink et al., 1987;Long, 2001). In recent years, significant progress has been made in resolving the complex exchange of signals responsible for nodulation through genome assembly, mutational, and expression analysis, and proteome characterization of legumes (e.g., Sato et al., 2008;Young et al., 2011;Marx et al., 2016) and rhizobia (e.g., Giraud et al., 2007;Tolin et al., 2013;Čuklina et al., 2016;Remigi et al., 2016). In a previous study (Squartini et al., 2002), we described a novel species, R. sullae, that specifically induces symbiotic nodulation in the legume sulla (Hedysarum coronarium L. syn. Sulla coronaria [L.] Medik.; Faboideae; Hedysareae). We had previously provided the first description of the infection process of this legume by its bacterial symbiont and their morphological peculiarities (Squartini et al., 1993). Sulla is found in the Mediterranean basin with a distribution from northern Africa to southern Spain and southern Italy. It is of particular importance in agriculture due to its ability to adapt to drought and coastal conditions (Douglas, 1984), and is therefore an ideal subject for studying salt tolerance (range limit 150-700 mM NaCl), alkaline tolerance (up to pH 9-10.5), and drought stress (ranging from 0.5 to −0.95 MPa for PEG; Fitouri et al., 2012;Issolah et al., 2012). Biochemical and genetic characterization of several bacterial strains nodulating sulla (Struffi et al., 1988;Muresu et al., 2005) allowed us to select R. sullae isolate IS123 T (= USDA 4950 T = DSM 14623 T ) as type strain. Phylogenetic analyses (Squartini et al., 2002) suggest it is closely related to the widely-studied congeneric Rhizobium leguminosarum (symbiont of peas) and Rhizobium etli (symbiont of beans).
Focusing, in particular, on the genes ruling the symbiotic association with the host plant, we sequenced the genome of the R. sullae type strain IS123 T in order to: (1) compare this genome with other members of the order Rhizobiales, including a conspecific isolate (R. sullae WSM1592, Yates et al., 2015), two congeneric cases (R. leguminosarum bv. viciae and R. etli), and various genera within the legume-nodulating taxa; (2) determine whether or not the type strain of R. sullae, which comes from a plant that still grows in the wild, carries a richer array of genes for the symbiotic interaction with its host; (3) assess whether or not the traits allowing the host plant to endure extreme soil conditions (drought and alkalinity) are also mirrored by appropriate determinants in the bacterial genome.

MATERIALS AND METHODS
The R. sullae strain IS123 T has been previously described as a new species by Squartini et al. (2002). Genomic DNA was isolated from exponentially-growing liquid cultures in Yeast-Mannitol broth. Cultures were lysed and washed, the DNA was extracted using chloroform/phenol, and, after ethanol precipitation, it was purified using a Qiagen DNeasy blood and tissue kit, according to the manufacturer's protocol (Qiagen, Hilden, Germany). Libraries were prepared with the TruSeq DNA Library Preparation Kit (Illumina Inc.), as described by the manufacturer. DNA was sequenced using the Illumina HiSeq platform, and the sequences assembled with an Edena (Exact DE Novo Assembler; Hernandez et al., 2008), which uses an overlap layout consensus algorithm with an overlap cutoff of 47 bases. Assembled contigs were further scaffolded using SSPACE Basic (version 2.0; Boetzer et al., 2011). The scaffolded version of the assembled genome was used for gene prediction and annotation using two independent pipelines: RAST, available at: http://rast. nmpdr.org/ (Aziz et al., 2008), and the Prokka bacterial genome annotation tool (Seemann, 2014), which uses Prodigal (Hyatt et al., 2010) for prokaryotic gene identification. While running the Prokka, the Rhizobiales order was selected to represent the genomes from this class in order to increase the robustness of the annotations. The Prokka and RAST outputs revealed slight differences in the functional annotations, so we manually checked them focusing on the symbiotic nodulation and nitrogen fixation genes. The curated version of Prokka was used as the final annotation of R. sullae. To ascertain the similarity between the assembled genome and previously published genomes of Rhizobium etli and R. sullae WSM1592, we estimated average nucleotide identity (ANI), as previously described by Goris et al. (2007). The ANI analysis was limited to the taxa which were expected to share most identities with the strain under study.
A whole-genome orthology search to identify conserved functions across different organisms was run on six additional rhizobial genomes using the Reciprocal BEST BLAST HIT (RBBH) (Ward and Moreno-Hagelsieb, 2014) with an E-value cutoff of 1E-5. Genes with identity >60% and coverage higher than 60% were considered orthologous.
The MAPLE resource (Metabolic And Physiological potentiaL Evaluator; available at: http://www.genome.jp/tools/maple/help. html) was used to estimate function abundance and evaluate metabolic and physiological potential. The reference database was KEGG (Kyoto Encyclopedia of Genes and Genomes), and R. sullae proteins were mapped and normalized on the ribosomal proteins counts in its pathway database. KAAS (KEGG Automatic Annotation Server) was used for ortholog assignment (KO, Kegg Orthology) and pathway mapping. The PHASTER tool was used to search for sequences of phages and prophages (Arndt et al., 2016). To ascertain whether any genes relevant for rhizobia could be missing, an HMM (Hidden Markow Model) search of the Rhizobiales was carried out to extract the core pan-genome, which was aligned with the genomes of the other rhizobia and scored for conservation percentage. The analysis was carried out using bcgTree, which is a bacterial core gene analyzer (http://www.dna-analytics.biozentrum.uni-wuerzburg. de, Ankenbrand and Keller, 2016).

RESULTS
17,902,513 paired end (PE) reads (2 × 100 bp) were obtained, accounting for a total of 9.1 Gb sequence information.
The draft genome assembly contains 447 contigs with a total assembly size of 7,889,725 bp and an N50 of 72.64 kb, and 317 scaffolds with a total assembly size of 7,889,576 bp and an N50 of 118.24 kb. The longest contig was 296,399 bp, the longest scaffold 488 bp. The whole genome GC content was 59.88% (Table 1). Sequence annotation revealed 7,776 protein coding sequences (CDSs), which is more than those found in the other R. sullae, R. leguminosarum, and R. etli genomes reported so far. A total of 51 tRNA genes and 6 rRNA (rrn) operons were identified.
The raw reads of the R. sullae IS123 T genome are available in the European Bioinformatics Institute (EBI) database under project number PRJEB9435. The genome assembly can be accessed under the code ERZ403196 (Sample id: ERS738956, Assembly accession: GCA_900169785, WGS account: FWER01; Scaffold accession range: FWER01000001-FWER01000317).
The gene predictions, CDS FASTA files, and the corresponding proteins are all provided as Supplementary Datasets 2-6.
Average nucleotide identity (ANI; Goris et al., 2007; available from http://enve-omics.ce.gatech.edu/ani/) was calculated to assess the similarity between the assembled genome and the genomes of R. sullae WSM1592 and R. etli CFN42. The analysis revealed a high average ANI (97-98%) between the sequenced genome and the genomes of the two afore-mentioned related species (Figures 1, 2), confirming their evolutionary closeness.
In addition, the R. sullae IS123 T proteins were mapped against the KEGG pathways database, and the result normalized by the ribosomal proteins. The resulting KEGG orthology assignment of genes and modules, and the corresponding rarefaction curves are shown in Supplementary Material S1, along with an annotated map of a prophage that was found in the R. sullae IS123 T genome.
Core pan-genome analysis of the Rhizobiales was carried out to see whether any major genes were missing. The core was found to include 108 genes and the analysis indicated that the R. sullae IS123 T genome presented here is more than 99% complete with respect to the core rhizobial pan-genome. This confirms both the trustworthiness of the ortholog analysis and the overall good quality of the genome coverage. The results of the pan-genome analysis are available as Supplementary Dataset 7. Functional categorization of the genes present in the IS123 T strain based on the RAST annotation is shown in Figure 3. While this kind of analysis is often not exhaustive, as the output depends on the accuracy of the gene classification databases, a general picture can be drawn. The majority of its genes are predicted to be involved in carbohydrate metabolism, which is consistent with the rhizobial lifestyle within which exopolysaccharides are important traits involved in host plant recognition, lectin binding, and attachment to root surfaces. In addition, a considerable array of genes relate to membrane transport, and their involvement in the secretion machinery for metabolites and signals is also important in plant-microbe interactions.
Sequence annotation confirmed the presence of genes involved in nodulation (nod) and symbiotic N 2 fixation (nif and fix). The R. sullae type strain genome displayed all the essential nodulation genes, which are required for early symbiotic nodulation via nod factor production (Baev et al., 1992;Young et al., 2006). Among the nod genes, nodA was present in two copies, the first a short truncated form located in the nodABC operon, the second a supposedly functional full copy upstream of a gene for carbonic anhydrase. Opposite the truncated nodA and its nod box (the NodD regulator-binding cassette for nod genes) in the nodABC operon on the complementary strand, there is a copy of nodD interrupted by the insertion element ISRh1 (Meneghetti et al., 1996). Two other intact copies of nodD are located elsewhere, one of them close to its cognate receptor syrM, while nodB and nodC are present in the afore-mentioned operon, and nodH is immediately downstream of nodC. The nod factor export machinery also appears to be in place, due to the presence of the transport-related nodIJ operon, whose origin has been traced from Beta-to Alpha-Proteobacteria (Aoki et al., 2013). The presence of other defined nod genes means we can predict the following potential characteristics of the corresponding nod factors: regular acylation (presence of nodFE), O-acetylation (nodL and nodX), sulfation (nodH, nodPQ), and a putative Ocarbamoylation (nolO). These features, encoding for precise nod factor decorations, may account for the very tight host specificity of this rhizobium for its symbiont plant H. coronarium.
Among the nif genes, which are involved in nitrogen fixation, we identified the structural nitrogenase units nifH, nifD, and nifK, plus regulatory and accessory determinants (nifA, nifB, nifE, nifN, nifS, nifT, nifU, nifW, nifZ, and the 4Fe-4S ferredoxin nitrogenase-associated gene). Among the fix genes, we detected fixL, fixS, and fixC, which potentially encode a symbiotically essential cbb 3 high-affinity terminal oxidase, a rate-limiting component for symbiotic N 2 fixation (Patschkowski et al., 1996). In fact, ab-initio gene prediction using Prodigal identified three copies of the cbb 3 high-affinity terminal oxidase in this R. sullae IS123 T genome. In confirmation of the presence of other genes related to symbiosis, we also detected sufBCDE, the gene cluster acting as house-keeping storage genes of FeS clusters (Trotter et al., 2009). Furthermore, we found two copies of gndA gene, which catalyzes 6-phosphogluconate dehydrogenase and is the main source of catabolizing sugars in rhizobial species using the Entner-Doudoroff pathway, as previously reported for R. leguminosarum and R. etli (González et al., 2006;Young et al., 2006).
In order to assess the uniqueness of the genes and the level of conservation of those shared across related taxa, a whole genome orthology search was run on six additional rhizobial genomes. Besides R. sullae IS123 T , we used the following species for the analysis (gene counts are indicated in parenthesis): R. sullae WSM1592 (6985) On inspecting all the genes known to correspond to symbiotic traits, (nodulation, nitrogen fixation, or their ancillary metabolism), we found that, compared with the type strain R. sullae IS123 T , its conspecific R. sullae WSM1592 lacks nifQ (the molybdenum donor for nitrogenase synthesis), the second copy of the structural nifH gene for nitrogenase, a copy of the fixN Cytochrome c oxidase subunit, a copy of the fixL oxygen sensor protein, a copy of the fixK nitrogen fixation regulation protein, and the gene for ferredoxin I, an enzyme linked to the electron flux for N 2 reduction to nitrogenase.
The symbiotic genes that are present in R. sullae IS123 T but have no orthologs in either of the two species of the same genus and closest to R. sullae according to 16S rRNA taxonomy, namely R. leguminosarum bv. viciae and R. etli, include, in addition to the above: the nodG (fabG) 3-oxoacyl acylcarrier protein reductase; nodM, encoding functions for efficient Nod signal production; fixJ, the response regulator inducing nif operons in response to microaerobiosis within the nodule tissue; the putative nitrogen fixation protein gene fixT; a copy of the nod factor synthase nodA; a copy of the nodD2 and syrM genes encoding for flavonoid-responsive transcriptional activators; and a copy of the nodU putative carbamoyl transferase.
Interestingly, the ortholog search across the seven taxa, including the three different genera Bradyrhizobium, Mesorhizobium, and Sinorhizobium (Ensifer), revealed the R. sullae IS123 T genome to be the one displaying the highest number of genes related to symbiosis, which was 59.
Since the host plant H. coronarium, as mentioned in the introduction, has remarkable properties of tolerance to several environmental stress factors, such as salinity, drought, and alkaline soil pH, and displays regular root nodulation under such conditions, we inspected the genome of its R. sullae microsymbiont to search for genes which could account for corresponding genotypes on the bacterial side.
We identified genes important in maintaining osmoregularity, which included the osmotically inducible protein C. However, FIGURE 2 | Average Nucleotide Identity (ANI) plot between R. sullae IS123 T and R.sullae WSM1592. in bacteria this gene is reported to be related not to saline conditions, but rather to responses to organic hydroperoxides and reactive oxygen species in general (Lesniak et al., 2002).
An integral membrane protein, YggT, involved in response to extracytoplasmic stress (osmotic shock) is present, but this gene appears to be widespread among bacteria, including Escherichia coli, and is not characteristic of isolates endowed with particular tolerances. The same goes for Aquaporin Z, relevant for osmoregulation but present in E. coli as well (Delamarche et al., 1999). In general, the R. sullae genome does not reveal its distinctiveness in terms of membrane physiology and selectivity, and it contains customary features, such as the osmolarity sensor envZ, the osmosensitive K+ channel histidine kinase KdpD, and the beta-(1->2) glucan export ATP-binding/permease protein NdvA, which is responsible for the synthesis of osmoregulated periplasmic glucans. These oligosaccharides are common not only to most rhizobia but to the whole Proteobacteria phylum, in which glucan concentration in the periplasm increases in response to a decrease in environmental osmolarity (Bohin, 2000). This finding is again not indicative of any adaptation to salty conditions as the response is instead to the opposite scenario (diluted circulating solution). The presence of this gene is nevertheless worth remarking on, as it is another corollary to the symbiotic proficiency of R. sullae. Indeed, mutations in the same ndvA gene in S. meliloti result in the delayed formation of numerous small white nodules that are not invaded by the mutant bacteria and are consequently unable to fix nitrogen (Ielpi et al., 1990).
Considering that the host plant is able to withstand 700 mM NaCl, and is regularly encountered with well-developed root nodules in the Tuscan pliocenic clays at pH 9.6, the absence of prominent traits of extremophily among the genes of its bacterial symbiont could appear to be in contrast to the habitats where the host plant and bacteria meet. However, it is interesting to recall that in a prior study on the stress tolerance of several strains of R. sullae, including the present type strain IS123 T , compared with other rhizobia (Struffi et al., 1988), the R. sullae strains did not perform any better than the other rhizobia. Regarding NaCl, in most strains growth was limited to 0.5%, the same limit in R. leguminosarum bv. trifolii (clover symbiont) and R. leguminosarum bv. viciae (pea symbiont). The limit was 1% in only two strains, but this was still lower than in S. meliloti (symbiont of alfalfa), which tolerated values up to 3%. As for alkali tolerance, R. sullae did not perform very differently from the others, and R. sullae IS123 T , in particular, was able resist up to pH 8.5, the same level attained by the pea and clover symbionts, and lower than that recorded for the alfalfa symbiont (Struffi et al., 1988). These data are consistent with the lack of obvious stress tolerance traits in R. sullae emerging from the present genome analysis, and allow for an interpretation focused on the ecological coexistence of tolerant hosts and non-tolerant microorganisms. In our earlier experiment with R. leguminosarum bv. trifolii , we demonstrated that the ability to survive in the presence of increasing doses of different heavy metals was dependent on the plant's phenotype, irrespective of the bacterial phenotype. For example, at chromium levels which were still tolerated by clover but high above the minimum inhibitory concentration for the rhizobium, the latter was nevertheless able to effectively nodulate the plant. Interestingly, we also noticed that on curing those rhizobia of their large symbiotic plasmids, they lost the ability to induce and invade root nodules, but at the same time their intrinsic level of chromium tolerance increased. These data, along with the afore-mentioned absence of either genomic or phenotypic evidence for salt-or alkali-tolerance traits in R. sullae, lead us to the view that in these endophytic plantmicrobe interactions, of the two partners the plant host is the one that is critically endowed with the ability to occupy challenging environments. In this sense, the symbiosis assumes the role of a shelter for the hosted microorganisms, which could otherwise not endure the same stress factors when facing them as free living cells. In the case of heavy metals, as the study cited showed, it was also rather revealing to see that rhizobia could "opt" between entering the plant, which offers a shielded niche, or renounce the whole symbiotic relationship by dropping an extrachromosomal replicon. The consequent loss of interactivity, due to the absence of the plasmid-borne genes, accounts for several membrane permeability changes, which are related to overall increased resistance to extracytoplasmic stress factors.
In the particular case of R. sullae, it should be added that its host, H. coronarium, has unique features of environmental adaptation, which, as far as is known, have not been encountered in any other plant. As we described in an earlier report (Tola et al., 2009), this genus is capable of forming modified lateral roots (called shovel roots) that accumulate calcium crystals, a mechanism that explains the plant's ability to grow, often as a unique vegetation form, in calcareous soils of extreme alkalinity. The scavenging of Ca 2+ ions locally affects the chemical equilibrium of the soil carbonate buffer and allows efficient acidification of the rhizosphere, even in limestone-rich, highly basic soils. In the same report, we showed that H. coronarium is able to change the pH of its surrounding solution, even from alkalinity to acidity, while other legumes are only able to exert the reverse effect. These findings further support the evidence that it is the plant's task to withstand the harsh conditions characterizing its habitat. At the same time, the plant is able to improve the conditions in the root microenvironment, allowing its microbial partner to multiply in a situation it would otherwise not be able to withstand, and eventually to be fully rescued within the endophytic and symbiotic domains. The picture of this interaction is thus reconciled with the genome annotations of R. sullae, which, as mentioned above, featured genes that were not very different from those of most average bacteria in terms of coping with environmental stress, yet also featured a plethora of genes for symbiotic interactions.
The genome of R. sullae IS123 T presented here draws attention to the way it differs from the available genome of the cognate strain R. sullae WSM1592, and has shown how the type strain appears to be endowed with a richer array of genes pertaining to the symbiotic phenotypes of nodulation and nitrogen fixation. Moreover, there are certain ecological aspects that allow us to make further inferences. These concern the migratory path of the host plant on its colonization route into Europe from the North African plains, which is presumed to coincide with its domestication and subsequent cropping. The type strain sequenced here was, in fact, isolated in the early seventies from a wild stand of its host in Cadiz province, the southernmost part of Spain, facing the Straight of Gibraltar and Africa. Strain WSM1592, instead, was recovered in 1995 from cultivated Sulla plants grown at the Ottava experimental station on the Italian island of Sardinia. The sites where the two strains were isolated within their host home range are shown on the map of the western Mediterranean area in Figure 4. Some geographical as well as ecological variables may account for the observed inter-strain variations between IS123 T and WSM1592. In this regard, it is worth remarking that H. coronarium is one of the few plant species which still exist in both wild and cultivated conditions. This unique feature makes it possible to sample and compare the genomes of two strains, one from the wild (IS123 T ), the other from an agricultural context (WSM1592), and to investigate the subtle ways in which they differ in spite of an overall conserved genome. The host plant is considered to be native to Algeria, Tunisia and Morocco, as well as Spain (http://www.ildis.org/), where it is found essentially in the southernmost region. The presumed origin of the plant species in the north-western African belt above the Saharan Atlas range is supported by the high frequency of sites in which H. coronarium is encountered in natural populations in that area compared with European countries, and by the fact that the related species Hedysarum flexuosum is the only legume nodulated by rhizobia, whose 16S rRNA sequence displays a >99% similarity with the R. sullae type strain IS123 T (Aliliche et al., 2016). These nodules are ineffective, as R. sullae is, as far as is currently known, fully symbiotic only with H. coronarium.
In conclusion, the present study enabled us to verify ecological correspondences between host-plant lifestyles and bacterial symbiont genotypes. In addition to the interest in comparing genomic features of isolates from cultivated vs. wild Sulla, the present genome, a strain collected from a wild specimen thriving in an arid region of the Mediterranean, provided us with the possibility to investigate, and to rule out, the presence of genes related to drought and salt tolerance, which are two major characteristics of the naturally-occurring ecotypes of its plant host. Its range extends throughout the near-desert belt in Northern Africa, and its invasion of Mediterranean Europe has apparently followed a route via Gibraltar, the site where R. sullae IS123 T was isolated and the transcontinental crossing point.

AUTHOR CONTRIBUTIONS
AS and RM: conceived the project. Rv, RG, AG, and EP: performed the sequencing GS, TS, NL, and RR: were responsible of bioinformatics analysis. AS: wrote the manuscript.

ACKNOWLEDGMENTS
GS thanks the Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy, for computational support. Tessa Say is gratefully acknowledged for revising the English language style throughout the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2017.01348/full#supplementary-material