The Use of Transposon Insertion Sequencing to Interrogate the Core Functional Genome of the Legume Symbiont Rhizobium leguminosarum

The free-living legume symbiont Rhizobium leguminosarum is of significant economic value because of its ability to provide fixed nitrogen to globally important leguminous food crops, such as peas and lentils. Discovery based research into the genetics and physiology of R. leguminosarum provides the foundational knowledge necessary for understanding the bacterium's complex lifestyle, necessary for augmenting its use in an agricultural setting. Transposon insertion sequencing (INSeq) facilitates high-throughput forward genetic screening at a genomic scale to identify individual genes required for growth in a specific environment. In this study we applied INSeq to screen the genome of R. leguminosarum bv. viciae strain 3841 (RLV3841) for genes required for growth on minimal mannitol containing medium. Results from this study were contrasted with a prior INSeq experiment screened on peptide rich media to identify a common set of functional genes necessary for basic physiology. Contrasting the two growth conditions indicated that approximately 10% of the chromosome was required for growth, under both growth conditions. Specific genes that were essential to singular growth conditions were also identified. Data from INSeq screening on mannitol as a sole carbon source were used to reconstruct a metabolic map summarizing growth impaired phenotypes observed in the Embden-Meyerhof-Parnas pathway, Entner-Doudoroff pathway, pentose phosphate pathway, and tricarboxylic acid cycle. This revealed the presence of mannitol dependent and independent metabolic pathways required for growth, along with identifying metabolic steps with isozymes or possible carbon flux by-passes. Additionally, genes were identified on plasmids pRL11 and pRL12 that are likely to encode functional activities important to the central physiology of RLV3841.


INTRODUCTION
Rhizobium leguminosarum is a Gram-negative soil and rhizosphere colonizing bacterium that is also capable of engaging in endosymbiosis with specific leguminous plant genera. The host specificity of rhizobial infection is dependent upon the exchange of specific chemical signals between the infecting bacterium and host plant (Oldroyd et al., 2011), and R. leguminosarum is often divided into biovars based on infectious host range. The biovar viciae is indicative of Rhizobia capable of infecting leguminous plants such as peas (Pisum sativum), lentils (Lens culinaris), and vetch (Vicia spp.). When in the endosymbiotic state, Rhizobium bacteroids reduce atmospheric nitrogen N 2 to ammonia, which is then exported to the plant for assimilation. In return, the plant host provides fixed carbon and other micro-nutrients to the bacteroids to sustain the symbiosis (Wielbo, 2012;Udvardi and Poole, 2013). The availability of symbiotically supplied nitrogen enables leguminous plants to satisfy their high nitrogen demands and, in part, contributed to the evolutionary success and diversification of the Leguminosae (Doyle and Luckow, 2003).
R. leguminosarum bv. viciae 3841(RLV3841) (Johnston and Beringer, 1975), has long been considered a model organism for Rhizobium research and was one of the first rhizobial strains with a published complete genome sequence (Young et al., 2006). Aside from the overarching agricultural context of studying RLV3841, the model organism provides other interesting avenues of research because of its complex genomic structure (Young et al., 2006) and versatility of physiology and lifestyle. RLV3841 has a relatively large bacterial genome comprised of a single 5.1 Mb chromosome and 6 large, stably maintained, plasmids ranging in size from 0.87 to 0.15 Mb. The RLV3841 genome is predicted to contain approximately 7346 genes, a substantial percentage of these genes (25.2%) are annotated as hypothetical genes of unknown function, warranting further investigation. The use of high-throughput experimental approaches may allow prioritization of the study of individual genes within this large functionally unknown group.
The development of next-generation sequencing technologies has resulted in high-throughput methods of transposon (Tn) mutagenesis to study gene function at a genome scale (Gawronski et al., 2009;Goodman et al., 2009;Langridge et al., 2009;van Opijnen et al., 2009). For example, INSeq was developed through the introduction of a type II restriction enzyme site within the IS element of the himar1C9 mariner Tn allowing for the specific capture and PCR amplification of genomic DNA adjacent to the Tn insertion site (Goodman et al., 2011). Next generation sequencing of PCR amplicons derived from DNA isolated from these Tn-mutant libraries allows for the sequencing of millions of Tn insertion tags which can be mapped to the genome sequence and used to enumerate the relative abundance of individual Tn mutants within a mutant population (Barquist et al., 2013;van Opijnen and Camilli, 2013). INSeq and similar high-throughput Tn mutagenesis methods have been used to study the genetic basis of bacterial physiology (Griffin et al., 2011;Brutinel and Gralnick, 2012;Kuehl et al., 2014;Yang et al., 2014;Le Breton et al., 2015;Lee et al., 2015;Meeske et al., 2015;Pechter et al., 2015;Rubin et al., 2015;Hooven et al., 2016;Troy et al., 2016), bacterial resistance to biotic and abiotic factors (Gallagher et al., 2011;Khatiwara et al., 2012;Phan et al., 2013;Byrne et al., 2014;Murray et al., 2015;Shan et al., 2015;Yung et al., 2015;Tran et al., 2016), and colonization of hosts or specific environments (Gawronski et al., 2009;Dong et al., 2013;Kamp et al., 2013;Skurnik et al., 2013;Bishop et al., 2014;Johnson et al., 2014;Verhagen et al., 2014;Wang et al., 2014;Gutierrez et al., 2015;Moule et al., 2015;Turner et al., 2015;Capel et al., 2016). Recently, INSeq was adapted for use in the Rhizobiaceae, and was demonstrated to be a suitable tool for high-throughput functional genomic screening in RLV3841 (Perry and Yost, 2014).
In this paper we used INSeq to define a core functional genome (CFG) of RLV3841 and deconstruct central carbon metabolism for growth on mannitol, a preferred carbon source of rhizobia (Vincent, 1970;Geddes and Oresnik, 2014). Comparing the genes required for growth on mannitol with those required for growth on tryptone-yeast extract media, we estimated a core set of functional genes required for optimal growth. Furthermore, the results of this study demonstrate that using INSeq and growth on minimal media is an effective approach to gain new insight into central carbon metabolism in RLV3841.

Mariner Transposon Mutant Pool Generation and Mutant Selection
Mutant pools were generated as described in Perry and Yost (2014) with minor modifications. Briefly, RLV3841 and E. coli SM10λpir(pSAM_Rl) were grown in broth culture until late exponential phase. 1.0 ml of donor and 0.5 ml of recipient strains were mixed in a 1.5 mL microcentrifuge tube and pelleted at 12, 000 g for 3 min. The cell mixture was then washed twice with 1000 µL 1X phosphate buffer saline (PBS), and resuspended in a final volume of approximately 100 µL 1X PBS. Six independent conjugations were spotted onto pre-warmed VMM-mannitol plates and were incubated at 30 • C overnight (∼18 h). Following incubation each of the 6 conjugation spots were scraped and resuspended in 1000 µL of 1X PBS and pooled into a total volume of 6 ml, representing the RLV3841 Tn mutant library.
Selection of mutant pools on VMM-mannitol was conducted using six 245 × 245 mm 2 (Corning) Neo and Str containing VMM-mannitol agar plates. For each selection plate, 500 µL of the RLV3841 mutant pool was spread plated and allowed to dry. The agar plates were incubated at 30 C for 72 hr representing between 15 and 18 generations of RLV3841 growth on minimal media. Cells from each plate were harvested by scraping the thin film of cell growth and re-suspending in 5 mL of 1X PBS, vortexed thoroughly to homogenize the cells, and then a 1000 µL aliquot of each cell suspension was used for cell pelleting and DNA isolation (Perry and Yost, 2014).

Transposon Insertion Sequencing
The mutant pools recovered from 2 of 6 selection plates were pooled into 3 independent technical replicates for DNA extraction and Tn insertion sequencing. The method used for library preparation and sequencing is described by Perry and Yost (2014) with modification to the adaptor sequences  INSeq_Adpt_Top and INSeq_Adpt_Bottom (Table S1). The final library concentration of the 3 stock library preparations was 1.29, 1.21, and 1.39 ng/µL after size selection. DNA sequencing was performed on an Ion Torrent PGM using 200 bp sequencing chemistry and a 316v2 sequencing chip. The raw sequence output for the 3 technical replicates was 1.4, 1.1, and 1.1 M reads, respectively, and can be found under SRA deposit number: SRR3400585-7 (TY datasets are deposited under SRR3400588-90). Sequencing data from the 3 technical replicates were pooled for a combined total of 3.6 million to achieve sufficient read depth for hidden Markov model analysis (DeJesus and Ioerger, 2013). Raw sequencing data was processed, aligned, and analyzed as previously described (Perry and Yost, 2014). Briefly, raw reads were clipped at the end of the pSAM_Rl mariner IR element, and clipped on the 3 ′ end at the beginning of the INSeq_Adpt sequence. The resulting reads were screened for the presence of a 5 ′ -TA insertion site, and a length ≥15 bp. Trimmed reads were aligned to the RLV3841 reference genome using bowtie, with the option to suppress reads with multiple alignments from the output file enabled. The alignment files were then converted to wig files and analyzed using the tn-hmm.py python module. The pipeline resulted in a total of 2,374,819 reads being aligned onto the RLV3841 reference genome, after quality filtering and discarding of unaligned reads. The HMM then assigned each "TA" insertion site to one of four growth states which was used to assign each gene to a specific growth phenotype (DeJesus and Ioerger, 2013).

Curation of INSeq Data
Outputs from the HMM for both the TY (Perry and Yost, 2014) and VMM dataset were combined based on RLV3841 locus number. Riley functional classifications for each gene were obtained from the lab of Phillip Poole, University of Oxford (http://rhizosphere.org/lab-page/molecular-tools/genomes/ rlv3841-genome) and appended to the dataset. Duplicate gene sequences were manually examined using reciprocal BLAST to the RLV3841 reference genome for all genes with <0.30 insertion density to avoid miss-classification as an essential gene due to a lack of mapped insertions as a result of the alignment penalty for multiple mapping locations. The final compiled and curated dataset is found in Supplementary File 1.

Insertional Mutagenesis of RL0920 and RL3335
Two previously uncharacterized VMM-mannitol growth impaired genes (RL0920 and RL3335) were mutated to verify the INSeq growth phenotype data. Mutants were created using a single crossover mutagenesis approach with pJQ200SK (Quandt and Hynes, 1993), as described in Vanderlinde et al. (2010). Briefly, a 563 bp internal fragment of RL0920 was PCR amplified using primers RL0920_Fwd and RL0920_Rev, which introduced 5 ′ ApaI and 3 ′ SpeI restriction enzyme sites. The 563 bp amplicon was then directionally cloned into pJQ200SK using ApaI and SpeI. The new vector pJQ200SK-RL0920 was conjugated into RLV3841 wildtype using E. coli strain S17-1, and single cross over mutants were selected for on TYSmGm, and then screened for sucrose sensitivity to confirm the plasmid integration. The identical procedure was used to create a single crossover mutant in RL3335 using primers RL3335_Fwd and RL3335_Rev to generate a 603 bp internal gene fragment for cloning into pJQ200SK (Table S1). The resulting mutants in RL0920 and RL3335 were named MA0920 and MA3335.
Growth Curve Analysis of MA0920, MA3335, and RLV3841 Wildtype Growth curves of RLV3841, MA0920, and MA3335 were performed using a shaking head Synergy HT Microplate Reader (Biotek) with 250 µL of inoculated growth media per well in a 96-well NuncR Optical Bottom Plate (Thermo Scientific), and a 40 µL Anti-Evaporation Oil (Ibidi) overlay. Inoculated growth medium was prepared by scraping cells from freshly grown TYSm or SmGm plates, washing twice with 1XPBS, standardizing the cell suspension for an approximate initial OD 600 of 0.01. Cells were grown at 30 • C with 10 min of shaking followed by OD 600 measurements every 30 min, for 72 h. Each growth curve was derived from the mean OD 600 measurements of 7 replicates. Mean generation times (MGT) were calculated in early exponential phase (OD 600 < 0.100) and late exponential phase (0.100 < OD 600 < 0.200) by calculating the average time required to double the optical density of the cell cultures, within the defined growth phases.

Tn Insertion Sequencing and Transposition Summary of VMM-Mannitol Mutant Pools
The RLV3841 genome contains 140,056 potential mariner insertion sites distributed across the chromosome and 6 megaplasmids. Insertion densities within the genome ranged from 0.65 to 0.86 across the 7 replicons, with an average insertion density of 0.80 (Table S2). The insertion densities were similar to the insertion densities previously observed in an INSeq experiment using TY medium for selection (Perry and Yost, 2014). HMM analysis of the VMM INSeq data assigned 7.2 and 2.6% of genes to essential (ES) or growth defective (GD) growth phenotypes, respectively (Table S3). Given sufficient generations of growth GD mutants would be excluded from the mutant community. Therefore, ES and GD states were pooled into a single growth impaired (GI) category (9.8% of the genes) for annotation with Riley functional groupings. 87.1% of genes were observed to have no impact on growth (NE), and 1.20% of genes with Tn insertions became over-represented within the mutant communities, and were predicted to confer a growth advantage (GA) phenotype (Table S3). To further simplify interpretation of the INSeq data by focusing exclusively on loss-of-function phenotypes the NE and GA genes were pooled into a single category termed growth neutral (GN). Within the genome 1.6% of genes had sequence duplications resulting in no information concerning their impact on growth due to the multiple Tn insertion tag mapping location penalty imposed. As well, 0.3% of genes were observed to lack a "TA" dinucleotide motif leaving them without a target site for mariner Tn insertion.
The RLV3841 CFG, VMM-Growth Impaired, and TY-Growth Impaired Phenotypes Comparison of the VMM-mannitol INSeq dataset with the TY INSeq dataset identified a set of genes that when mutated conferred a GI phenotype under both conditions; these 491 genes were assigned to the CFG (Figure 1). Whereas, 170 and 72 genes, when interrupted by Tn insertion, resulted in a condition dependent impaired ability to grow on VMM (VGI) and TY (TGI), respectively. Genes within the CFG were represented by 5 major Riley classification groups: macromolecule synthesis and metabolism (20.0%), energy and carbon metabolism (10.2%), ribosomal constituents (9.8%), cell envelope (9.4%), and conserved hypothetical proteins (9.2%) (Figure 2). Genes that gave rise to a VGI phenotype were composed of 3 major functional groups: metabolism of amino acids (17.6%), biosynthesis of co-factors and carriers (14.7%), and nucleotide biosynthesis (11.8%). While the 4 major groups of TGI genes consisted of hypothetical proteins (27.8%), cell envelope (15.3%), macro-molecule synthesis and metabolism (12.5%), and transport and binding proteins (11.1%).
Open reading frames annotated as hypothetical proteins represent approximately 25.2% of the RLV3841 genome. Of these, 103 hypothetical proteins were observed to have a GI phenotype on VMM, TY, or both growth conditions ( Table 1). Sequence duplication within the RLV3841 genome resulted in 20 annotated hypothetical proteins not being assayed for a growth phenotype FIGURE 1 | Venn diagram of growth impaired genes observed for growth on TY and VMM-mannitol. Growth impaired genes observed uniquely on TY or VMM were assigned to TY-growth impaired or VMM-growth impaired. Growth impaired genes observed in both treatments were assigned to the CFG. Genes without potential mariner insertion sites, or which had highly similar sequence redundancy in the RLV3841 genome were discounted. using INSeq, due to multiple potential alignments of sequenced Tn insertion tags.

Growth Curve Analysis of Two Predicted VMM-Growth Impaired Mutants
Growth curves of RLV3841, MA0920 (INSeq predicted RL0920 VGI), and MA3335 (INSeq predicted RL3335 VGI), in TY and VMM-Mannitol broth over 72 h are shown in Figure 3. After 72 h growth RLV3841, MA0920, and MA3335 reached final mean OD 600 readings of 0.696, 0.657, and 0.628 on TY; and 0.588, 0.179, and 0.198 on VMM, respectively. Mean generation times for RLV3841, MA0920, and MA3335 on TY were 2.5, 3.0, and 3.0 h in early exponential phase; and 3.5, 4.5, and 4.0 h in late exponential phase. The MGTs in VMM-Mannitol were observed to be 6.0, 7.5, and 8.5 h in early exponential phase for RLV3841, MA0920, and MA3335 respectively. In late exponential phase, MA0920 and MA3335 halted growth and did not complete an additional doubling, whereas RLV3841 wildtype continued to double with a MGT of 9.5 h.

The Genetics of Central Carbon Metabolism for Growth on Mannitol
INSeq was used to identify a potential minimal central carbon metabolism pathway for growth on mannitol. Figure 4, and accompanying Table 2, provide a metabolic map illustrating the interconnections of the Embden-Meyerhof-Parnas (EMP) pathway, Entner-Doudoroff (ED) pathway, pentose phosphate (PP) pathway, and tricarboxylic acid cycle (TCA) that compose central carbon metabolism, with an overlay of the observed growth impaired phenotypes. It was observed that the genes required for mannitol uptake and conversion to fructose-6P (F6P) were VGI. Genes required for conversion of F6P to pyruvate were observed to be VGI within the ED pathway. Genes required for conversion of F6P into glyceraldehyde-3P (GA3P), via the upper EMP pathway, were observed to have no impact on growth when mutated. Genes required for the conversion of GA3P to pyruvate as part of the lower EMP pathway were found to be VGI when mutated. Assimilation of pyruvate into the TCA cycle was observed to be VGI via more than one metabolic pathway. Mutation of genes within the TCA cycle were observed to result in GI or VGI phenotypes, with the exception of a growth neutral step at the conversion between fumarate and malate ( Figure 3; Table 2).

Plasmid Growth Impaired Genes
Mutation of genes encoded on RLV3841 megaplasmids that resulted in a growth impaired phenotype were assigned as plasmid growth impaired (PGI), while plasmid genes that when mutated impaired growth on VMM or TY uniquely were designated PVGI and PTGI respectively. Collectively, 48 genes distributed across the 6 megaplasmids were predicted to result in a PGI phenotype when mutated (Supplementary File 1). The 48 PGI genes included 11 Riley functional classes ( Figure 5). All 6 plasmids were observed to have a set of 3 replication protein encoding genes categorized as PGI. Plasmid pRL11 was observed to carry the most PGI biosynthetic genes including 6 genes of FIGURE 2 | Relative distributions of Riley functional gene classifications within growth impaired categories. Growth impaired genes on both VMM-Mannitol and TY media were assigned to the CFG. CFG, TY-growth impaired, and VMM-mannitol growth impaired genes were assigned Riley functional classification based on Young et al. (2006). The relative abundance of genes within observed Riley functional groups were then calculated within each growth impaired category.
an 8 gene cluster predicted to code for cobalamin biosynthesis ( Figure 5).

DISCUSSION
The CFG of RLV3841 Young et al. (2006) used phylogeny of conserved genes and GC% to describe a core and accessory genome within the RLV3841. The present study helps to improve the level of resolution for distinguishing between the RLV3841 core genes and accessory genes using functional genetic screening. We propose that RLV3841 has a CFG which can be defined as the core set of genes required for normal growth, independent of any specific environmental condition. In this study we approximate the CFG of RLV3841 by contrasting INSeq generated data sets from growth on complex peptide rich and minimally defined media with mannitol as the sole carbon source. Cross referencing VGI and TGI chromosomal genes identified an overlapping set of 491 genes that we putatively assigned to the CFG of RLV3841, as their loss of function resulted in a GI phenotype that appears to be independent of the growth media used. The number of CFG genes was less than that of both the TGI and VGI genes, which represented 563 and 661 genes respectively (Figure 1). This is to be expected if the CFG represents a central set of genes required for core cellular functions.
Defining a CFG in RLV3841 provides context for subsequent INSeq and classical genetic studies. For example, the described CFG for RLV3841 will help explain if a mutation resulting in a GI phenotype in a plant associated environment is the result of impairing some aspect of the RLV3841 CFG or is instead the result of a plant specific interaction. This is particularly important for genes encoding hypothetical proteins of unknown function. Summarizing the distribution of gene functions in the CFG identified 20 functional groupings (Figure 2). Five major categories accounted for over half of the total CFG (287 genes). These 5 functional categories included: macromolecule synthesis and modification (98 genes), energy and carbon metabolism (50 genes), ribosome constituents (48 genes), cell envelope (46 genes), and hypothetical proteins (50 genes) (Figure 2). Hypothetical proteins aside, these 4 categories logically compose the majority of the CFG as they represent the central genes required for the synthesis of the major cellular components, central conversion of carbon for generation of reductant and ATP, production and modification of protein synthesis machinery, and synthesis of the cell envelope.
The fact that genes encoding hypothetical proteins was one of the five major categories assigned to the CFG reinforces the broadly acknowledged observation among geneticists that the function of many genes involved in core cellular processes still remain uncharacterized. INSeq is a powerful technique that can help identify hypothetical proteins required for survival in specific growth conditions, and ultimately will advance our rate of discovery in this large and under studied category of genes. For example, in this study we identified a total of 103 hypothetical proteins observed to have an impaired growth phenotype under at least one growth condition; of which 19 were VGI, 20 were       TGI, 14 were plasmid associated, and the remainder belonged to the CFG (Table 1).

Chromosomal VMM and TY Specific Growth Impairment
Growth on minimal media requires the biosynthesis of several metabolites that can be scavenged from a complex growth media. Therefore, it was expected that there were more VGI genes compared to TGI genes, and that a substantial portion of the VGI genes are functionally classified for the biosynthesis of amino acids (30 genes), cofactors and carriers (25 genes), nucleotides (20 genes), and metabolic intermediates (15 genes) (Figure 2). To further interrogate the VGI dataset, two previously uncharacterized genes, RL0920 a putative ATP-binding mrp family protein and RL3335 a putative lysophospholipase, were selected for targeted mutagenesis. A growth curve in liquid culture was used to characterize the generation time and growth response of the mutants. As expected, MA0920 and MA3335 were substantially growth impaired with increased generation times in VMM-mannitol. After 72 h of growth the mutant cultures were 1/3 the density of the wildtype in VMM-mannitol, whereas in TY they had achieved similar densities to the wildtype (Figure 3).
The TGI genes were composed of the smallest number of GI genes and the least functional complexity (Figure 2). Unlike the VGI genes, the general mechanism underlying the TGI genes does not hinge on metabolic biosynthesis, which is not surprising as complex media will contain many, if not all, metabolic intermediates required for growth. The largest functional categories observed in the TGI genes were hypothetical proteins (20 genes), cell envelope (11 genes), macromolecule synthesis and modification (9 genes), and transport and binding proteins (8 genes). Previous studies have identified several TGI genes, which collectively are implicated in outer membrane integrity or periplasmic function, suggesting that growth on complex media may require specific envelope  (Fuhrer et al., 2005;Geddes and Oresnik, 2014). Amino acid precursors are indicated in bold text. The impact of mutations in each metabolic step on growth were determined by contrasting results from TY and VMM-mannitol INSeq experiments. Genes observed to be growth defective or essential exclusively on VMM-Mannitol were concluded to be involved in central carbon metabolism of mannitol. Genes observed to be GD or ES on TY and VMM-mannitol were grouped, and assumed to have roles in central carbon metabolism that were not mannitol dependent.

Central Carbon Metabolism of Mannitol
Using INSeq we were able to screen the 4 major conserved central carbon metabolic pathways for genes required for growth on mannitol (Figure 4; Table 2). As expected, disruption of the genes required for mannitol transport and conversion to  Plasmid borne genes observed to be growth defective or essential for growth on both TY and VMM-Mannitol were grouped. The contribution of growth defective or essential genes to megaplasmid stability, or growth of RLV3841 at an organism level, cannot be discriminated by INSeq analysis alone; therefore, the grouped growth defective and essential genes were assigned to plasmid growth impaired. Riley functional classifications were assigned to PGI genes for comparison of PGI profiles between megaplasmids.
F6P, along with glucose-6-phosphate isomerase (pgi) required for conversion of F6P into glucose-6P (Keele et al., 1969;Arias et al., 1979), resulted in a GI phenotype. Genes required for conversion of G6P into gluconate-6P (GN6P) through the ED pathway (overlapping the oxidative branch of the pentose phosphate pathway) resulted in a GI phenotype when mutated on both TY and VMM (Figure 4: reaction 1.1-2). The essential nature of these reactions may be due to several factors: (1) the NADPH generated during conversion (Spaans et al., 2015), (2) the possibly toxic accumulation of phosphorylated intermediates (Cerveñanský and Arias, 1984;Kadner et al., 1992), (3) the role of G6P in the biosynthesis of osmoprotectants (Barra et al., 2003), and (4) the need for carbon flux into the ED pathway for glycolytic growth (Arias et al., 1979;Glenn et al., 1984;Stowers, 1985). Conversion of GN6P into pyruvate through the ED pathway was determined to be VGI (Figure 4: reaction 1.3-4). Mutations in the upper EMP pathway, aside from pgi, were observed to be neutral, which is in agreement with previous work (Glenn et al., 1984). The lower EMP pathway (sometimes considered shared by the ED pathway), converts glyceraldehyde-3P (GA3P) into pyruvate, with mutants at all enzymatic steps appearing GI on mannitol, and at some steps on TY as well (Figure 4: reaction 2.5-9). The VGI nature of the lower EMP when grown on mannitol is possibly due to mutants being unable to metabolize GA3P produced from the ED pathway into the amino acid precursors glycerate-3P (G3P) or phosphoenolpyruvate (PEP), and as well catabolize GA3P into pyruvate (Finan et al., 1988) for use in the TCA cycle.
The overall GI phenotype of mutants in the entire ED pathway, in contrast to the EMP pathway, suggests that it is the central pathway for glycolytic conversion of carbon into the central carbon intermediate pyruvate. This is in agreement with previous research indicating the ED pathway to be the preferred route of carbon metabolism for glycolytic growth of rhizobia and closely related genera (Stowers, 1985;Fuhrer et al., 2005;Geddes and Oresnik, 2014). Conversion of pyruvate into the TCA cycle was observed to have 2 unique GI pathways (Figure 4: reaction 3.1-14). Conversion of pyruvate into the TCA through acetyl-CoA as an intermediate is a less direct route than the direct conversion of pyruvate into oxaloacetate, and mutants in pyruvate carboxylase (Figure 4: reaction 3.14) were observed to be more severally GI than in citrate synthase (Figure 4: reaction 3.2). Additionally, on TY media mutants in pyruvate dehydrogenase appeared GI, while mutants in pyruvate carboxylase were not. These findings are in agreement with anaplerotic production of oxaloacetate (OAA), via pyc-mediated fixation of CO 2, being important for replenishing OAA pools under minimal growth conditions (Gokarn et al., 2001;Sirithanakorn et al., 2014).
When grown on mannitol as a sole carbon source, mutants in TCA cycle genes were all observed to be GI, aside from the conversion from fumarate to malate (Figure 4: reaction 3.2-3.9). Mutants in fumC and fumA (RL2701 and RL2703) were observed to be growth neutral, possibly due to functional redundancy between the two isozymes, which has been previously reported in Bradyrhizobium japonicum (Acuña et al., 1991). Future INSeq studies with prolonged exposure to selective pressure may identify which isoenzyme is dominant. Several TCA steps were observed to be GI on both VMM-mannitol and TY media, confirming that the TCA cycle is an important component of the CFG in RLV3841. Mutants in isocitrate dehydrogenase (icd) were both VGI and TGI (Figure 4: reaction 3.4). It has been previously shown that mutants in icd develop glutamate auxotrophy (McDermott and Kahn, 1992), which may explain the impaired growth phenotype on minimal medium, but does not explain growth impairment on peptide rich TY medium. Mutants in sucB and citM (sucA) were also observed to be growth impaired on both media (Figure 4: reaction 3.5). For growth on VMM-mannitol, the GI nature of these mutants is possibly due to increased α-ketoglutarate concentration due to the inability to metabolize into succinyl-CoA, and possibly perturbation of the GOGAT cycle via shunting of excess α-ketoglutarate (Bravo and Mora, 1988;Dunn, 1998).
The genes encoding enzymes for the glyoxalate by-pass, phosphoenolpyruvate carboxykinase and fructose bisphosphate aldolase, were observed to be GN on VMM-mannitol and TY media; suggesting that gluconeogenesis is not be required for growth in either condition (Kornberg, 1966;McKay et al., 1985;Stowers, 1985). This seems reasonable, as growth on mannitol is presumably glycolytic, therefore sugar conversion can be performed on metabolic intermediates generated during the breakdown of mannitol. And on TY media, many carbohydrates and polyols are likely already present in trace amounts, mitigating the need for gluconeogenesis.
Almost every gene involved in the non-oxidative branch of the PP pathways was observed to have a neutral impact on growth when mutated. The only two reactions in the PP pathway that were GI on mannitol were for the conversion of ribulose-5P into ribose-5P, or alternatively xylulose-5P (Figure 4: reaction 4.2 and 4.3). The mutation of ribose-5-phosphate isomerase A appearing GI is logical as ribose-5P is a precursor for 5-phosphoriboosylα-1-pyrophosphate, which is the branching point for flux of carbon in nucleotide, histidine, nicotinamide, and tryptophan biosynthesis (Kilstrup et al., 2005;Switzer, 2009). The conversion of GN6P into ribulose-5P however was observed to be GN. There are two possible explanations for mutants in this step appearing GN: (1) functional redundancy in isozymes Gnt and GntZ compensates for mutation of either (Figure 4: reaction 4.1), or (2) ribulose-5P can be replenished from xylulose-5P derived from either F6P and G3P being shunted into the PP pathway (Figure 4: reaction 4.4-6) or the phosphoketolase pathway (EC 4.1.2.9). In general, the interconnectedness of the PP pathway makes it difficult to study single gene knockouts, as mutants may adapt to interrupted pathways by using alternative metabolic routes or isozymes (Geddes and Oresnik, 2014).

Plasmid Growth Impaired Genes
There are unique opportunities and challenges for exploring plasmid biology when conducting INSeq experiments on bacterial species with genomes containing multiple large plasmids. Mutations that result in the loss of a plasmid from the accessory genome, due to impaired plasmid stability, will appear phenotypically identical to a GI mutant lost from the mutant pool due to a decreased growth rate. Therefore, it cannot be concluded directly from the INSeq data if a particular transposon insertion within a plasmid resulted in a GI phenotype, or instead compromised plasmid stability or replication. All plasmids contained a three gene cluster of rep genes that were observed to be essential (PGI) when mutated (Figure 5; Supplementary File 1), which in conjunction with the annotated function of these genes suggests the loss of these Tn insertion tags from the mutant population was due to impaired plasmid replication.
Beyond identifying putative rep genes, INSeq can be useful in identifying plasmid genes that provide the host cell with growth benefits under specific conditions. For example, pRL11 contains a putative 8 gene operon (pRL110625-32) predicted to be involved in cobalamin biosynthesis, that was severely PGI on VMM-mannitol, and moderately PGI on TY when mutated. Previous studies in R. etli identified similar growth phenotypes when a homologous cobalamin biosynthetic cluster on p42e was deleted (Landeta et al., 2011). Additionally, pRL120209 (putative tpiA) and pRL120210 (putative rpiB), which encode enzymes predicted to function in central carbon metabolism, were found to be PGI when grown on mannitol as a sole carbon source. Their chromosomal homologs RL2513 (tpiA) and RL2547 (rpiB) were both observed to be GN ( Table 2). In a closely related R. leguminosarum bv. viciae strain VF39, the homologs of pRL120209 and pRL120210 are required for growth on erythritol as a sole carbon source (Yost et al., 2006). This suggests that pRL12 may carry genes important for the normal growth of RLV3841 under specific conditions, which is in agreement with previous studies that showed pRL12 cured strains of R. leguminosarum were unable to grow on minimal media (Hynes et al., 1989).

Technical Considerations of INSeq in RLV3841
INSeq, like all high-throughput molecular techniques, is not without limitations. Genes with large regions of sequence duplication, or no mariner "TA" insertion sites, cannot be assayed using INSeq. These genes represent only 1.9% of the genome. However, the targeted library preparation method and robust statistical analysis afforded by the use of MmeI-adapted mariner transposon appears to outweigh its disadvantages. A sufficient saturation of neutral mariner insertion sites within the mutant community allows for confident identification of regions that lack insertions due to the negative selection resulting from a GI phenotype. In this and previous, work a sufficient level of neutral "TA" site saturation has been recovered to allow Bayesian methods of analysis, using a relatively modest amount of sequencing data when compared to other mariner INSeq studies. Considerations in inoculation density, the number of generations of growth during negative selection, and the ability to recover mutant populations needs to be carefully considered in order to ensure enough complexity is retained in the mutant pools post-selection to allow for statistical analysis.

AUTHOR CONTRIBUTIONS
BP and CY conceived and designed the research; BP and MA conducted the experiments; BP conducted the data analysis; BP and CY prepared and finalized the manuscript.

FUNDING
The presented research was conducted with support from a Discovery Grant awarded by the Natural Sciences and Engineering Council of Canada.