Study of Pyrroloquinoline Quinine From Phosphate-Solubilizing Microbes Responsible for Plant Growth: In silico Approach

Plants cannot uptake the insoluble form of phosphate from soil. Phosphate-solubilizing microbes (PSMs) release gluconic acid (C6H12O7) that is synthesized by the interaction between co-factor pyrroloquinoline quinine (PQQ) and glucose dehydrogenase within themselves and hence convert the insoluble phosphate into a soluble form. Phylogenetic analyses based on individual sequences of PqqA–PqqE proteins involved in the PQQ biosynthetic pathway manifested clear clustering formation of the selected species according to their respective genera such as Pantoea, Rouxiella, Rahnella, Kosakonia, Mixta, Cronobacter, and Serratia. In multiple sequence alignment (MSA), numerous semi-conserved sites were identified that indicate acquired mutation during evolution. The selected pqq genes that appeared within an operon system sustain a specified order viz. pqqABCDE for both positive and negative strands. The nucleotide composition of the encoding genes displayed higher content of GCs at different positions of the codons and has also been properly reflected in relative synonymous codon usage (RSCU) values of the codons with few exceptions. The correspondence analysis (COA) based on RSCU proclaimed that the pqqB genes prefer A/U-ending codons over G/C, while for the pqqE gene, G/C-ending codons are comparatively more preferable (except CGU). Mutational pressure contributes to shaping the codon usage pattern for the selected pqq genes evinced from the COAs, while the ENc and neutrality plot gives attestation of natural selection. The higher values of CAI indicate the gene adaptability and codon usage bias. These comprehensive computational studies can be beneficial for further research in molecular phylogenetics, genomics, and proteomics and to better understand the evolutionary dynamics of PQQ.


INTRODUCTION
Microbes present in the rhizosphere, the nutrient-rich region of the soil surrounding the plant's roots, play important roles in the enhancement of plants' nutrient uptake, metabolic activities, crop productivity, and tolerance to multiple biotic and abiotic stresses. Plant growth-promoting rhizobacteria (PGPR) are a group of free-living soil bacteria found in the rhizosphere that directly and organic phosphate mineralization are the predominant ways of dissemination of P in the soil by PSMs (Alori et al., 2017). Inorganic phosphorus like Ca 3 (PO 4 ) 2 , FePO 4 , and AlPO 4 are solubilized in the soil in the following ways: production of low-molecular-weight organic acid, inorganic acid (such as sulfuric, nitric, and carbonic acids), and H 2 S, proton release from NH + 4 (assimilation/respiration), direct oxidation pathway, production of siderophores, and exopolysaccharides (EPSs) (Rodríguez and Fraga, 1999;Sharma et al., 2013;Zhao et al., 2014;Alori et al., 2017). The main attention of this study is the phosphorus solubilization by organic acid production like gluconic, 2-keto gluconic, citric, oxalic, malic, fumaric, malonic, tartaric, glutaric, propionic, butyric, glyoxylic, and adipic acid exuded by PSMs (Sharma et al., 2013). The organic acid is produced by the oxidation pathway or fermentation and respiration of organic carbon compound in periplasmic space (Zhao et al., 2014). Excretion of organic acid chelates the phosphorus-bound cations such as Fe 3+ , Al 3+ , and Ca 2+ (Goldstein, 1994) through their hydroxyl and carboxyl groups and lowers the rhizospheric pH through O 2 /CO 2 exchange and proton-bicarbonate balance, thereby releasing the bound phosphorus (Kim et al., 1997). Gluconic acid and 2-ketogluconic acid are the most common acids excreted by PSMs. Previously, it has been reported that PSMs improve the growth and yield of various crops, including rice, maize, wheat, sugarcane, legumes, soybean, mustard, chickpea, peanut, sugar beet, tomato, potato, etc. (Rodríguez and Fraga, 1999;Alori et al., 2017;Kalayu, 2019). Thus, inoculating seeds/crops/soil with PSMs as biofertilizers is a promising approach to improve world food production without causing any environmental hazards (Alori et al., 2017). Glucose dehydrogenase (GDH) enzyme encoded by glucose dehydrogenase (gcd) gene accompanied by cofactor PQQ (pyrroloquinoline quinine) forms gluconic acid (An and Moe, 2016).
PQQ is a small, low-molecular-weight redox-active cofactor for several bacterial glucose/alcohol dehydrogenases and participates in methylotrophic metabolism (Duine, 1999;Anthony, 2001;Matsutani and Yakushi, 2018). The biosynthesis of PQQ is accomplished by the gene products of a specific PQQ operon that comprises six genes, pqqA-F (Meulenberg et al., 1992). In the PQQ operon, genes can be organized differently in various organisms. For example, in Acinetobacter calcoaceticus, pqqF is absent in the PQQ operon (Goosen et al., 1987(Goosen et al., , 1989; in Methylobacterium exotorquens, pqqD fuses with pqqC and pqqA-E gene present in the PQQ operon, but pqqF and pqqG are located outside the operon (Morris et al., 1994;Chistoserdova et al., 2003). It has been proposed that pqqF/pqqG is a labile element within the PQQ operon. Genetic knockout studies of pqqF suggest that it is not essential for cofactor production; other non-specific cellular proteases can make up for that during PQQ biogenesis (Velterop et al., 1995). Conservation of pqqA, pqqC, pqqD, and pqqE is an essential requirement for this pathway that is recognized by genetic knockout experiment (Sonnenburg and Sonnenburg, 2019). Klebsiella pneumoniae needs six genes designated as pqqA, pqqB, pqqC, pqqD, pqqE, and pqqF (Puehringer et al., 2008); seven genes are required for Methylobacterium extorquens (AM1) designated as pqqDGCBA and pqqEF (Morris et al., 1994), whereas A. calcoaceticus requires only four genes (gene I, II, III, and IV) (Goosen et al., 1989) for PQQ synthesis. In Pseudomonas fluorescens B16, the PQQ operon consists of 11 genes (pqqABCDEFHIJKM) (Choi et al., 2008). PqqA is a 22-amino-acid peptide containing conserved Glutamic acid (Glu) and Tyrosine (Tyr); side-direct mutagenesis identified these two residues on PqqA as a provider of carbon and nitrogen atoms required for PQQ synthesis (Goosen et al., 1987). Barr et al. (2016) reported carbon-carbon bond formation between Glu15 and Tyr19 side chain within the precursor peptide PqqA in the presence of PqqE and peptide chaperone PqqD. PQQ operon forms PQQ by the following chemical steps (Supplementary Figure 1): at first, PqqD interacts with PqqA to modify Glu and Tyr side chains of PqqA, and this modified complex gets attached to the active side of PqqE. PqqD is a small, free protein, but in the reaction, it fuses with PqqC at the C-terminal and with PqqE at the N-terminal (Shen et al., 2012). PqqE is a radical S-adenosyl-L-methionine (SAM) protein that contains a conserved CXXXCXXC motif (Menendez et al., 1995) and a [4Fe-4S] cluster near the N-terminal (Broderick et al., 2014), with a C-terminal SPASM domain containing two more iron-sulfur clusters  or  in Auxiliary Site 1 (Aux1) (Grell and Goldman, 2014;Saichana et al., 2017;Tao et al., 2019) and Auxiliary Site 2 (Aux2). The [4F-4s] cluster near the N-terminal in PqqE promotes the cleavage of SAM, in the presence of reducing reagent (Sodium dithionite) that leads to the formation of a 5 ′ -deoxyadenosyl radical that eliminates a hydrogen atom from the conserved glutamate side chain of PqqA, to establish C-C cross-linking to conserve Tyr (Wecksler et al., 2009). This product is hydrolyzed by the PqqF/PqqG to cut off N-and C-terminal amino acids and cellular proteases generate a Glu-Tyr di-amino acid for PqqB. PqqB acts as an oxygenase that hydroxylates Tyr of Glu-Try di-amino acid into a 3,4 dihydroxy intermediate that oxidized again to a trihydroxy derivative of Tyr (Klinman and Bonnot, 2013;Koehn et al., 2019). Then, it undergoes spontaneous cyclization yielding 3a-(2-amino-2-carboxyethyl)-4, 5,6,7,8, as the substrate for the final oxidative steps catalyzed by PqqC (Zhu and Klinman, 2020). Escherichia coli and Salmonella typhimurium are impuissant to produce PQQ due to the lack of genes required for PQQ biosynthesis and, consequently, obtain the PQQ necessary for their survival in that environment (Matsushita et al., 1997;RoseFigura, 2010). It has been reported that strain EF260, a derivative of E. coli FB8, can synthesize PQQ after mutation and can oxidize glucose to gluconic acid via the GCD/PQQ pathway (Biville et al., 1991).
A previous study has reported that the expression of PqqA, a precursor of PQQ, is much higher than other Pqq proteins by using Pqq-lacZ protein fusion (Velterop et al., 1995). The authors have also depicted that the pqq genes can be expressed in anaerobic conditions but cannot produce PQQ, though synthesis of PQQ requires molecular oxygen. Toyama et al. (1997) reported that in Methylobacterium extoroquens AM1, PqqE is homologous to MoaA and nifB in the N-terminal region. It has been well-documented that MoaA protein involved in an early stage in the biosynthesis of the molybdopterin cofactor (Rajagopalan and Johnson, 1992;Rivers et al., 1993), and nifB is involved in an early step in the biosynthesis of the cofactor of nitrogenases (FeMo-, FeV-, and FeFe-cofactors). PQQC/D enzyme reaction requires molecular oxygen and reduced nicotinamide adenine dinucleotides (Toyama et al., 1997). Using Pulsed-field gel electrophoresis (PFGE), a DNA fingerprinting method for bacterial isolation has observed that the ribF gene that produces riboflavin cofactor for gluconic acid dehydrogenase is not closely linked with pqq genes. DNA sequencing of Tn5-induced glucose dehydrogenase (GDH)deficient mutant of Gluconobacter oxydans IFO3263 has revealed that identity to pqqE gene in the insertion site appeared in an open reading frame (Felder et al., 2000). To evaluate whether a PQQ biosynthetic gene is suitable to study the phylogeny of phosphate-solubilizing pseudomonads, two new primers were designed (pqqCf1 and pqqCr1) that specifically amplify the pqqC gene of the Pseudomonas spp. isolated from the wheat root (Meyer et al., 2011). Meyer et al. (2011) reported that the pqqC gene is an excellent molecular marker to study the diversity and evolution of phosphate-solubilizing pseudomonads as pqqC polymorphism is high to ensure an exemplary phylogenetic resolving power complementary to the conventionally used housekeeping genes gyrB and rpoD. Phylogenetic analyses have also reported on pqqC genes in Pseudomonas sp. (Xu et al., 2014). PQQ-producing bacteria are invasive (≈50%) and pathogenic (≈25%), which indicate no association between PQQ biosynthetic capabilities and pathogenicity. A total of 144 distinct species have been found by performing sequence alignment of pqqC, pqqD, and pqqE genes. The majority of these are α, β, and γ classes of proteobacteria, prevalent in Gramnegative bacteria, wild-type pqq, and various active site mutants (RoseFigura, 2010). Regulation of glucose dehydrogenase and PQQ cofactor in the model Pseudomonas putida KT2440 of broccoli rhizosphere soil has been analyzed. In this study, glucose dehydrogenase and PQQ levels vary upon growth condition, with a high level of glucose as the sole carbon source and low soluble phosphate (An and Moe, 2016). Highly efficient phosphate-solubilizing bacteria (PSB) Burkholderia ultivorans WS-FJ9 from pine tree rhizosphere solubilizes both organic and inorganic phosphate. The amount of solubilized inorganic phosphates by the WS-FJ9 strain is ∼140 mg/L. AP-2, gspE, and gspF genes are related to organic phosphate, hlyB is related only to inorganic phosphate, and phoR, phoA, AP-1, and AP-3 are related to both, as observed by using Next-Generation Sequencing (NGS) technology (Liu et al., 2020). Most of the studies on the phylogenetic analyses have targeted either individual or two genes, and none of them has employed every single gene involved in the entire PQQ biosynthetic pathway for understanding their evolution in Gammaproteobacteria. Phylogenetic analysis of pqqA, pqqB, pqqD, and pqqE genes has not been reported so far in Gammaproteobacteria. Hence, for the very first time in this article, a systematic study of the complete set of genes from selected Gammaproteobacteria, involved in the entire PQQ biosynthetic pathway, are studied to understand their phylogenetic relationships.
Codon usage bias is the non-random use of synonymous codons, observed in a wide range of organisms such as bacteria, yeast, plants, and mammals and mainly determined by mutation (e.g., GC content, mutation frequency, and their pattern) and natural or translational selection (e.g., gene expression level, tRNA abundance, protein length, gene translation initiation signals, and protein structure) (Ikemura, 1981(Ikemura, , 1985Gouy and Gautier, 1982;Bulmer, 1991;Duret and Mouchiroud, 1999;Gu et al., 2004;Wan et al., 2004;Trotta, 2013). Studies have discovered that mutational pressure act upon genes encoding common and uncommon amino acids (Hershberg and Petrov, 2008), whereas selection favors specific codons that promote efficient and accurate translation of genes with a high level of expression (Duret, 2000;Hershberg and Petrov, 2008). Several studies on codon usage have revealed that other factors could influence codon usage patterns as well, including secondary protein structure, replication, hydrophobicity, and hydrophilicity of the protein and the external environment (Lobry and Gautier, 1994;D'Onofrio et al., 2002;Sharp et al., 2005). Codon usage bias helps to understand the evolution of different organisms, in addition to their environmental adaptation (Angellotti et al., 2007). In context, natural consortium of five acidophilic bacteria used for biomining preferentially has low codon usage bias. Bacterial behavior in community/consortia cannot be equated at all times with their isolated state (Hart et al., 2018).
The prime objectives of this contemporary study are to (i) investigate the phylogenetic relationships of pqq genes in selected Gammaproteobacteria for an advanced understanding of their functional evolution; (ii) reveal the amino acid composition of the selected proteins and nucleotide composition of the encoded genes along with their organization within the genome of the selected microorganisms; (iii) discern whether the pqq genes are affected by mutational and selection pressure; (iv) investigate domain analysis of pqq genes with glucose dehydrogenase; (v) investigate correspondence analysis (COA) and codon usage bias of the proteins and encoded genes involved in PQQ biosynthesis from Gammaproteobacteria to obtain an insight into the major forces that shape synonymous codon usage bias; (vi) recognize the optimal codons that provide useful information about genetic engineering, gene prediction, and molecular evolutionary studies; and (vii) measure the Codon Adaptation Index (CAI) as an indicator of the adaptability of those genes.
This study will provide an insight into the compositional analysis of codon usage in pqq genes from phosphate-solubilizing Gammaproteobacteria based on GC content along with GRAVY (Grand average of hydropathicity) and aromaticity score, amino acid composition, the relationship between GC3% and expected effective number of codons (ENc or Nc), neutrality plot, and exploration of a crucial statistical method for the analysis of the data. Analysis of various codon usage indices can provide a better understanding of the pattern of synonymous codon usage bias in pqq genes from Gammaproteobacteria. Moreover, the information on optimal codons and factors affecting codon usage will facilitate further research in molecular phylogenetics and genomics and will help to better understand the evolutionary dynamics of the PQQ biosynthetic pathway valuable for further biotechnological studies.
Our analyses have given a novel insight into the codon usage patterns of the genes involved in the PQQ biosynthetic pathway that would assist in better understanding of the synonymous codon usage pattern as well as the factors influencing it.

Phylogenetic Analysis
To study the evolutionary relationship of the pqqA, pqqB, pqqC, pqqD, and pqqE genes of 79 Gammaproteobacteria including S. marcescens (DQ868536.1) and different species, subspecies, and strains of Klebsiella, Enterobacter, Pantoea, Nissabacter, Erwinia, Kosakonia, Rahnella, Mixta, Cronobacter, Rouxiella, and Serratia, multiple sequence alignment (MSA) of the protein sequences was performed with Clustal W (Thompson et al., 1994) alignment program with default parameter values (gap opening penalty of 10 and a gap extension penalty of 0.20) available in MEGA6.60 (Molecular Evolutionary Genetics Analysis) (http:// www.megasoftware.net/; Tamura et al., 2013). The phylogenetic trees based on the MSA of their protein sequences were constructed by using the unrooted Neighbor-joining method (NJ) (Saitou and Nei, 1987) using p-distance with a uniform rate and complete deletion of gaps/missing data, and the confidence level of each node was estimated by the Bootstrap method using 1,000 replications and displayed using MEGA6.06 program.
In bacterial phylogenetic studies, the 16S rRNA gene is one of the most frequently used genes. Phylogenetic analysis of 16S rRNAs allows the accurate statistical measurement of a broad range of evolutionary relationships due to highly conserved sequences. Therefore, we have used 16S rRNA gene sequences to understand the evolutionary relationship among various Gammaproteobacteria. After collecting the 16S rRNA sequences of 79 Gammaproteobacteria from NCBI (https://www.ncbi.nlm. nih.gov), a phylogram was constructed to analyze the relationship of those 16S rRNA genes.

Multiple Sequence Alignment
To study the conservation of the amino acids in the different sites within the domains, MSA was performed with ClustalOmega (http://www.ebi.ac.uk/Tools/msa/clustalo; Sievers et al., 2011) with the default parameters (gap penalty 10.0, gap extension penalty 0.05, weight matrix BLOSUM) and the JalView display option was used along with the sequence logo for the same by using WebLogo-3 (http://weblogo.threeplusone.com/; Crooks et al., 2004) (http://weblogo.berkeley.edu/logo.cgi).

Detection of Conserved Domains
For the identification of the different domains present in the pqqA-pqqE proteins from selected organisms, the identified protein sequences were analyzed using the Conserved Domain Search tool (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb. cgi) against Pfam database and HMMSCAN (http://www.ebi.ac. uk/Tools/hmmer/search/hmmscan) with default parameters.

Compositional Analysis and Codon Usage Indices
Codon W 1.4.4 (http://codonw.sourceforge.net/, Pedan, 2000) was used to calculate the indices of codon usage. Two single codons for methionine (AUG) and tryptophan (UGG) and three stop codons (UAA, UAG, and UGA) were excluded from the calculation. The following codon indices were determined: GCcontent at the first, second, and third codon positions (GC1, GC2, and GC3); frequency of either G or C at the third codon position of synonymous codons (GC3s); and the average of GC1% and GC2%, i.e., GC12%, general average hydropathicity (GRAVY) score (Kyte and Doolittle, 1982), Aromaticity score (Lobry and Gautier, 1994), and Effective number of codons (ENc) (Wright, 1990). The abovementioned parameters were calculated using the cDNA sequences of the genes obtained from Gammaproteobacteria.
GRAVY score was calculated as the sum of the hydropathy values for all the amino acids in a protein divided by the total number of residues in it (Kyte and Doolittle, 1982). A more negative GRAVY value indicates that the protein is more hydrophilic and vice versa. Aromaticity scores denote the frequency of aromatic amino acids Phenylalanine (Phe), Tyrosine (Tyr), and Tryptophan (Trp) in the hypothetically translated gene product (Lobry and Gautier, 1994). The hydropathicity and aromaticity protein scores are indices of amino acid usage and the variation in amino acid compositions can influence the results of codon usage analysis. The 20 amino acids have been classified into five groups based on their polarity and charge, i.e., Acidic polar: Aspartic acid (D) and Glutamic acid (E); Basic polar: Histidine (H), Lysine (K), and Arginine (R); Aromatic: Phenylalanine (F), Tyrosine (Y), and Tryptophan (W); Neutral non-polar: Proline (P), Cysteine (C), Methionine(M), Glycine (G), Alanine (A), Valine (V), Isoleucine (I), and Leucine (L); and Neutral polar: Glutamine (Q), Asparagine (N), Serine (S), and Threonine (T). An in-house PERL program was used to calculate the amino acid frequency of the protein encoded by pqqA, pqqB, pqqC, pqqD, and pqqE genes, using a particular codon for each time.
Relative synonymous codon usage (RSCU) is another wellknown measure of codon bias, used to examine the frequency of each synonymous codon that encodes the same amino acid. It was calculated as the ratio of the observed frequency of codons relative to the expected frequency of the codon under a uniform synonymous codon usage (Sharp and Li, 1986). RSCU values are not affected by sequence length and amino acid frequencies since these factors were eliminated during the computation. RSCU values of every codon for pqqA, pqqB, pqqC, pqqD, and pqqE genes were calculated using an in-house or home-based PERL program (Mondal et al., 2016). A codon with an RSCU value of more than 1.0 has a positive codon usage bias, while a value of <1.0 has a negative codon usage bias. RSCU value equal to 1.0 indicates that all the synonymous codons encoding the same amino acid were used equally and randomly, and that is nearly unbiased (Sharp and Li, 1986). Moreover, the synonymous codons with RSCU values >1.6 and <0.6 are considered as overrepresented and under-represented codons, respectively (Sharp and Li, 1986).
The neutrality plot is an analytical method used to measure codon usage patterns. To estimate the extent of directional mutational pressure against selection in the codon usage bias, a neutrality plot was drawn that reveals the results of the equilibrium coefficient of mutation and selection. GC12%, i.e., the average of GC1% and GC2% along the y-axis, is plotted against GC3% along the x-axis (Sueoka, 1988). In neutrality plots, if the dots are distributed diagonally with a regression coefficient value close to 1, mutation bias is assumed to be the principal force, shaping codon usage, whereas scattered distribution of dots suggests a significant role of natural selection with the regression coefficient close to zero (Sueoka, 1988).
The ENc or Nc vs. GC3% plot is widely referred to ascertain whether the codon usage of a gene is affected by mutational or selection pressure (Wright, 1990). An ENc or Nc plot has been drawn using the ENc or Nc value as the ordinate and the GC3 value as the abscissa. When the corresponding points fall near the expected curve, the mutation is considered as the principal force shaping codon usage, and when the corresponding points fall considerably below the expected curve, the prime force shaping codon usage is the selection. The ENc or Nc is used to measure the codon bias in an individual gene that is essentially independent of gene length (Wright, 1990). In this study, the ENc values were calculated for each gene of the selected microorganisms from GC3s under H0 (Null hypothesis, i.e., no selection) referring to the given equation, where S denotes GC3 (Wright, 1990).
The ENc value ranges from 20 (for a gene with extreme bias using only one codon per amino acid) to 61 (for a gene with no bias, using synonymous codons equally), and in case the value of ENc is >40, the codon usage bias is regarded as low (Wright, 1990). ENc values <35 denote higher codon usage bias, and the value that is more than 50 denotes general random codon usage (Wright, 1990).

Correspondence Analysis
The COA is a widely employed multivariate statistical method used to study the significant trends in codon usage variation among genes. In COA, all genes are plotted in a 59-dimensional hyperspace according to the RSCU values of 59 sense codons (Greenacre, 1984). Sequences where a given codon is applied in a similar fashion lie close to each other on the graph. COA provides an important trend of factors related to codon usage in different gene sets. The major trends in codon usage variation can be determined with relative inertia, according to which the genes are located to investigate the major factors affecting the codon usage pattern.
To investigate the major trend in RSCU variation among genes and distribute the genes along continuous axes in accordance with these trends, the COA  has been performed using the program Codon W 1.4.4.

Determination of Optimal Codons
Based on the principal axis of COA (axis 1), the top and the bottom 5% of genes were considered as the high and low datasets, respectively. When one codon's RSCU values in both the datasets were significantly correlated by a two-way Chisquared contingency test (p < 0.01), then the codon was defined as optimal codon (Liu, 2006;Huang et al., 2017;Begum and Mondal, 2020).

Expressional Probability
The geometric mean of the weight associated with each codon over the length of the gene sequence (measured in codons) is considered as the CAI (Sharp and Li, 1987), which measures the degree with which genes use preferred codons and assess the level to which selection can effectively mold the pattern of codon usage. The CAI values may range from 0 to 1, and the larger ones reveal a higher gene expression level as well as codon bias. The Sharp and Li method has been followed to calculate the CAI by employing an in-house PERL script and MS Excel 2007.

Statistical Analysis
We have calculated the average and standard deviation of RSCU of the codons of pqqA, pqqB, pqqC, pqqD, and pqqE genes of selected Gammaproteobacteria and amino acid frequencies for the proteins encoded by the genes in different microorganisms and for the entire selected microorganisms as well. To understand the preference of either codon on the respective microorganisms, Student's t-test (Ewens and Grant, 2001) was performed based on G-and C-ending codons of amino acids having a minimum of four degenerative codons with variation only in the third position using Microsoft Office Excel 2007. Under the null assumption, t statistics were calculated, from which the p-value was derived and taken as significant when p < 0.05.

Phylogenetic Analysis
The phylogenetic tree based on PqqA amino acid sequences (Supplementary Figure 2) has disclosed a proximate relation between Klebsiella and Enterobacter together with Pantoea, Nissabacter, Erwinia, Kosakonia, and Rahnella; Mixta, Cronobacter, Rouxiella, and Serratia. Phylogenetic analyses of PqqA, PqqB, PqqC, PqqD, and PqqE proteins involved in the PQQ biosynthetic pathway have promulgated a close association of Klebsiella and Enterobacter. Similarly, a phylogenetic tree based on PqqB amino acid sequences (Supplementary Figure 3) has revealed a close association of Nissabacter, Pantoea, Rouxiella, Erwinia, and Rahnella; Kosakonia, Mixta, and Cronobacter and all the species of Serratia. The phylogenetic tree based on the amino acid sequences of PqqC (Figure 1) has displayed a close association of Kosakonia and Cronobacter; Rouxiella and Rahnella, including the entire set of species of Pantoea and Serratia as well. By the same token of the above case, the phylogenetic tree based on PqqD amino acid sequences (Supplementary Figure 4) has revealed a close interrelation of Kosakonia and Cronobacter; Erwinia, Nissabacter, and Rahnella; three species of Mixta; and every selected species of Serratia. The phylogenetic tree based on the amino acid sequences derived from the pqqE genes (Supplementary Figure 5) has exhibited a neighboring association of Kosakonia and Cronobacter; Erwinia, Rouxiella, and Rahnella; as well as every single species of Serratia.
Overall, the phylogenetic analysis indicated a relatively close phylogenetic association of Klebsiella and Enterobacter, and all selected species of Pantoea, Rouxiella, Rahnella, Kosakonia, Mixta, Cronobacter, and Serratia. There was a close association of Kosakonia and Cronobacter in pqqC, pqqD, and pqqE genes. The similarity between the proteins of each clade may reflect their evolutionary interrelation.
Phylogenetic analyses have reported on pqqC genes in Pseudomonas sp. (Meyer et al., 2011;Xu et al., 2014). Meyer et al. (2011) reported that the pqqC gene is suitable to study the phylogeny of phosphate-solubilizing Pseudomonas populations in natural habitats. Comparing pqqC phylogeny and phosphate solubilization activity, they identified one phylogenetic group with high solubilization activity. The phylogenetic tree of the protein coding sequences of the pqq operon showed two main clusters of phosphate-solubilizing endophytic P. fluorescens strains (Oteino et al., 2015). These two clusters were consistent with the presence or absence of a 139-to 204-bp intergenic region between the pqqB and pqqC genes (Oteino et al., 2015). Matsumura et al. (2014) found that basidiomycete Coprinopsis cinerea (mushroom) contains a new type of PQQ-dependent sugar oxidoreductase. Phylogenetic analyses from the amino acid sequences of Coprinus cinereus sugar dehydrogenase (CcSDH), proteins homologous to CcSDH, and known quinoproteins suggested that these quinoproteins may be members of a new family that is widely distributed not only in prokaryotes, but also in eukaryotes (Matsumura et al., 2014). Phylogenetic analysis of full-length sequences of pqqE gene of several bacterial strains revealed that it was related to other Serratia strains, closely related to S. marcescens strains (Ludueña et al., 2017). Earlier, it has been reported that the pqqE gene is highly conserved within genus taxonomic level (Wecksler et al., 2009). The phylogenetic and codon usage analysis of the soil bacteria consistently demonstrated the relatedness of Azotobacter chroococcum with different species of the genus Pseudomonas (Saha et al., 2019). FIGURE 1 | Phylogenetic tree showing the relationship between the diverse Gammaproteobacterial groups considered in this study based on homologs of PQQ biosynthesis pathway-associated pqqC gene sequences. Amino acid sequences of pqqC genes from S. marcescens (DQ868536.1) and their homologous sequences from various Gammaproteobacterial species, including Klebsiella, Enterobacter, Pantoea, Nissabacter, Erwinia, Kosakonia, Rahnella, Mixta, Cronobacter, Rouxiella, and Serratia, were aligned with Clustal W algorithm, and the unrooted phylogenetic tree was constructed by the Neighbor-joining method based on the p-distance using MEGA6.06 software (http://www.megasoftware.net/, Kumar et al., 2008). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary relationship of the taxa analyzed (Felsenstein, 1985). The number in each node indicates the confidence value of that branch after bootstrapping the phylogenetic tree. A uniform rate was used to model evolutionary rate differences among sites. Complete deletion of gaps/missing data was used. Phylogenetic trees generated from pqqA, pqqB, pqqD, and pqqE are shown in Supplementary Figures 1-4. The phylogenetic tree obtained using 16S rRNA sequences (Figure 2) manifested a relatively close phylogenetic interconnection of Klebsiella and Enterobacter that authenticates our previous phylogenetic tree. It was observed that there was a close association of seven species of Serratia that appeared after 30 species of Klebsiella and Enterobacter, whereas 2 species of Serratia were observed between 8 species of Pantoea and 3 species of Klebsiella. Eventually, three species of Mixta appear in between three species of Pantoea at the forepart and six species of Pantoea at the tag end in contrast, which does not corroborate our previous phylogenetic tree. Mukhtar et al. (2020) identified halophilic bacteria using 16S rRNA sequence analysis and characterized their plant growth-promoting abilities. They had shown that bacterial strains belonging to Bacillus, Halobacillus, and Pseudomonas were dominant in the rhizosphere of halophytes. The phylogenetic tree based on gyrB, FIGURE 2 | A phylogenetic tree showing the relationship based on 16S rRNA gene sequences among the diverse Gammaproteobacterial groups considered in this study. 16s rRNA gene sequences were aligned with the Clustal W algorithm and an unrooted by the Neighbor-joining Maximum Composite Likelihood (MCL) method (NJ-MCL) using MEGA6.06 software (http://www.megasoftware.net/, Kumar et al., 2008). The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary relationship of the taxa analyzed (Felsenstein, 1985). The number in each node indicates the confidence value of that branch after bootstrapping the phylogenetic tree. A uniform rate was used to model evolutionary rate differences among sites. Complete deletion of gaps/missing data was used.
rpoB, and the 16S rRNA sequences of Alteromonas species revealed that the four Alteromonas stellipolaris strains showed a high level of relatedness among this group of species (Torres et al., 2019).

Multiple Sequence Alignment
MSA methods refer to a series of algorithmic solutions for aligning three or more homologous biological sequences (DNA, RNA, or protein) considering evolutionary events (mutations, insertions, deletions, etc.) gradually. MSA was performed to study amino acid conservation at PqqA, PqqB, PqqC, PqqD, and PqqE protein sequences.
In the Jalview representation, the corresponding height of the different amino acid residues in the sequence logo and the color conservation at the different sites disclose a few variations in the PQQ encoding genes, suggesting the level of FIGURE 3 | Semi-conservative sites of PqqA-PqqE proteins. Notes: The conserved sites generated from the MSA of all the identified pqqA-pqqE proteins from different species, subspecies, and strains of Klebsiella, Enterobacter, Pantoea, Nissabacter, Erwinia, Kosakonia, Rahnella, Mixta, Cronobacter, Rouxiella, and Serratia, respectively, were represented in sequence logos based on the alignment of full-length sequences of all the Pqq proteins and their homologous sequences. The overall height of each stack indicates the conservation of the sequence at the corresponding location, while the height of the letters within each stack indicates the relative frequency of the corresponding amino acid. All conserved site-wise analyses have been represented in Supplementary Table 2. conservation of amino acids. In contrast to the fully conserved amino acid sites, the non-or semi-conserved sites are more beneficial for providing information for evolutionary study. The amino acid pattern present in the non/semi-conserved sites gathers quite useful facts on the possible places of changes or mutations (Supplementary Table 3) that occurred in these sequences during evolution, accentuating differences in the organization of the identified proteins in different clades, as evidenced in the phylogenetic analyses (Figure 3).
From the Jalview representation, we have observed many non-conserved amino acids present in the PqqA. Depending on that, we have detected several mutable sites as follows with a positive impression over the phylogenetic tree (details of the mutable sites in Supplementary Materials). Site 2 in the aligned condition was occupied by amino acids Serine and Threonine. Serratia, Cronobacter, and Rouxiella have possessed Threonine. Mixta possesses Serine at this site. Serine and Threonine both are neutral, polar, and hydrophilic in nature. Therefore, at this site, a silent mutation has occurred. Effect of mutation is observed on the phylogenetic tree. Site 3 holds amino acids Threonine, Lysine, and Alanine. Serratia and Rouxiella were occupied by the amino acid Threonine. Cronobacter possesses Lysine. Only Mixta possesses Alanine. Alanine is non-polar and hydrophobic, while Lysine and Threonine are polar and hydrophilic. Alanine and Threonine are neutral, while Lysine is basic. Considering this, a mutation occurred at that site, and phylogenetic impact was observed. Just as in PqqB, a kind of similarity was noticed for the mutable sites from the Jalview representation too (details of the mutable sites in Supplementary Materials). At site 193, Serratia possesses Glycine. Kosakonia, Cronobacter, Mixta, Nissabacter, and Rahnella possess Serine and Pantoea, and Klebsiella possesses Threonine at this site. Serine and Threonine both are neutral, polar, and hydrophilic but Glycine, which does not possess any hydrophobicity or hydrophilicity, is neutral, a non-polar special type of amino acid, although a mutation occurred at this site, and the phylogeny impact was notable. Site 159 was almost occupied by the amino acid Isoleucine. Only Klebsiella holds the amino acid Valine. Isoleucine and Valine are neutral, non-polar, and hydrophobic. Considering this, a silent mutation occurs at this site, and phylogenetic impact is observed. By the same token in the conserved domain in PqqC, some inequalities were noted in the event of mutation at different mutable sites (details of the mutable sites in Supplementary Materials). Site 178 holds amino acids Aspartic acid, Histidine, Phenylalanine, and Glycine. Nissabacter, Erwinia, and Rahnella possess the amino acid Glycine. Serratia, Kosakonia, Cronobacter, and Mixta possess Aspartic acid. Only Rouxiella was occupied by Histidine. Pantoea and Klebsiella possess Phenylalanine. Phenylalanine and Glycine both are neutral and non-polar. Phenylalanine is hydrophobic and Aspartic acid and Histidine are hydrophilic. Therefore, a mutation occurred at this site with a positive phylogeny impact. PqqD manifests some mutable sites in Jalview representation too (details of the mutable sites in Supplementary Materials). Site 80 in the aligned condition was occupied by amino acids Phenylalanine and Leucine. Serratia, Nissabacter, Mixta, Rahnella, and Erwinia have possessed Phenylalanine. Kosakonia, Pantoea, Cronobacter, and Klebsiella possess Leucine at this site. Phenylalanine and Leucine are neutral, non-polar, and hydrophobic. Therefore, at this site, a silent mutation has occurred with no phylogeny impact. In PqqE, different kinds of similarities were noticed for the mutable sites from the Jalview representation as well (details of the mutable sites in Supplementary Materials). Site 364 hold amino acids Lysine and Threonine. Serratia, Kosakonia, Cronobacter, and Pantoea dispersa possess the amino acid Threonine. The rest of the organisms possess Lysine. Threonine is neutral; on the other hand, Lysine is basic. Both amino acids are polar and hydrophilic. Although mutation occurred at this site, no phylogenetic impact was observed. In contrast, site 273 was almost occupied by the amino acid Leucine. Serratia filled up with Methionine. Leucine and Methionine both are neutral, non-polar, and hydrophobic. The silent mutation occurred at this site, and the phylogenetic impact was notable. These abovementioned outcomes substantiate the evolutionary relationship of the PQQ protein from selected organisms as acquired in the phylogenetic tree.

Compositional Analysis of Codon Usage Based on GC Content
GC content is a very important feature in the analysis of codon usage bias, and the GC contents at the third base of one codon (GC3) are considered to most likely reflect codon usage pattern without deviation (Sueoka, 1988;Palidwor et al., 2010;Mondal et al., 2016). Overall and different positional GC% can provide some information about the relative contribution of mutational and selection pressure on codon usage bias. After analyzing the results, we found that the average GC3s, GC1%, and GC3% in pqqB, pqqC, pqqD, and pqqE genes were high, whereas in pqqA genes, the average of GC1%, GC2%, and GC3% were comparatively low in all six Gammaproteobacteria. The GC content value for the pqqA genes from Cronobacter, Klebsiella, Serratia, Kosakonia, and Rahnella was 54. 62, 46.47, 46.80, 45.84, and 42.36%, respectively, revealing a slightly AT-rich and GC-poor genome ( Table 2). The GC content values for the pqqB, pqqC, pqqD, and pqqE genes from Rahnella to Cronobacter were 62.45-68.78%, 65.08-71.43%, 59.81-71.83%, and 61.31-66.89%, respectively, demonstrating a GC-rich genome. It was observed that GC% at the third position was highest in all the pqq genes in Cronobacter but lowest in the pqqA gene of Rahnella compared to other Gammaproteobacteria. pqq genes were found to be GC-rich; the GC content at the one and third position was higher than at the second codon position, and a significant contrast of GC content was found between the first and second positions and second and third codon positions. Hence, the overall nucleotide composition suggested that the nucleotide C and G occurred more frequently in comparison to A and T in the coding sequences, and it was expected that G/C-ending codons might be preferred over A/T-ending codons in the pqqA, pqqB, pqqC, pqqD, and pqqE genes from all Gammaproteobacteria except for pqqA genes from Klebsiella, Serratia, Kosakonia, and Rahnella.
Genomic GC content varies both within and, considerably, between microbial genomes. Previous studies reported that the GC content of bacterial genomes ranges from 16 to 75%, and wide ranges of genomic GC content were observed within many bacterial phyla, including both Gram-negative and Grampositive phyla (Lightfield et al., 2011). An increase in genomic GC content can result in increased bacterial fitness (Raghavan et al., 2012), and this is associated with stronger selection on base composition (Hildebrand et al., 2010;Raghavan et al., 2012;Bohlin et al., 2017). Ran et al. (2014) demonstrated the coupling between the selection on protein sequences and the optimization of codon usage in a broad range of bacteria and archaea. The strength of this coupling varies over a wide range and strongly and positively correlates with the genomic GC content. They proposed that optimization of codon usage could be one of the key factors that determine the evolution of GC-rich genomes (Ran et al., 2014). Abundance of nitrogen, in soil, alters codon bias and has been identified as a driver for increased genomic GC content in parasitic microorganisms and nitrogen-fixing aerobic bacteria (McEwan et al., 1998;Seward and Kelly, 2016). Higher GC3 in genes is involved in cellular metabolism, and lower GC3 is involved in information storage and processing. High GC3 content provides more targets for methylation, which can serve as an additional mechanism of transcriptional regulation and affect the variability of gene expression (D'Onofrio et al., 2007;Tatarinova et al., 2010Tatarinova et al., , 2013. Botzman and Margalit (2011) have demonstrated that variations in prokaryotic global codon usage bias and GC3 are highly associated with species tolerances to environmental variability. On the other hand, the AT content leads to low thermodynamic stability, which plays a crucial role in the initiation of replication (Rajewska et al., 2012).

Neutrality Plot Analysis
To estimate the extent of directional mutation pressure against selection on the codon usage bias, a neutrality plot (GC12% vs. GC3%) was drawn that reveals the results of the equilibrium coefficient of mutation and selection. Neutrality plot for pqqA, pqqB, pqqC, pqqD, and pqqE genes (Figure 4) revealed no significant correlation between GC3% and GC12% (R2 = 0.007, 0.368, 0.349, 0.015, and 0.458, respectively), which indicates that some additional major factors, such as selection, can also influence the synonymous codon usage bias in all pqq genes of Cronobacter, Klebsiella, Kosakonia, Pantoea, Rahnella, and Serratia. In the neutrality plot, the average value of GC12% varied from 37.5 to 50% for pqqA, 50.81-64.47% for pqqB, 51.19-62.30% for pqqC, 42.71-60.75% for pqqD, and 48.61-62.13% for pqqE genes, whereas GC3% varied from 39.58 to 50%, 58.22 to 80.26%, 60.71 to 83.73%, 56.25 to 75.26%, and 57.77 to 85.53% for pqqA-pqqE genes, respectively. The linear regression line was not along the diagonal plane, which indicates that the genes were not under mutational pressure. Besides, the slope of the regression line near zero suggested that there was a low level of mutation biases or high-level conservation of GC contents throughout the genome. Neutrality plot analysis revealed that natural selection played a prominent role over mutation pressure in shaping the codon usage bias of chloroplast genes in two species of Pisum, viz. P. fulvum and P. sativum (Bhattacharyya et al., 2019), chloroplyll synthesis pathway-associated genes in monocot and dicot (Begum and Mondal, 2020), and Ginkgo biloba (He et al., 2016). The results of our neutrality plot indicated that natural selection played a pivotal role in determining the selective constraints on codon usage bias in the PQQ biosynthetic pathway-associated genes from selected Gammaproteobacteria.

Study of Codon Preference by t-Test
The average and standard deviation of RSCU values of the pqq genes from selected Gammaproteobacteria with preferred codons were furnished in Supplementary Table 4. In Cronobacter, Klebsiella, Kusunoki, Pantoea, Rahnella, and Serratia, significant differences at p < 0.05 among the pqq genes in their codon usage from the t-test have been observed ( Table 3). The abovementioned Gammaproteobacteria showed a preference for the amino acids Alanine (A), Glycine (G), Lysine (L), Proline (P), Arginine (R), Serine (S), Threonine (T), and Valine (V). In Glycine, pqqB, pqqC, pqqD, and pqqE genes of Cronobacter have substantiated preferences toward both U3 and C3 except the pqqA gene that appeared to be more biased to U3 over A3. All pqq genes of Cronobacter evinced predilections toward both U3 and G3 in Lysine. pqqC, pqqD, and pqqE genes of Cronobacter were preferably more biased to both A3 and G3 in Proline; therewithal, pqqB gene was more partial to A3 over U3. Interestingly, all pqq genes of Cronobacter, Klebsiella, and Serratia revealed an increased proclivity toward U3 over A3 in Glycine. Likewise, in the case of Lysine, all pqq genes of Cronobacter and Klebsiella were more biased to U3 over A3. All the pqq genes of Klebsiella, Kosakonia, and Pantoea were biased to C3 over G3 in Arginine. Just in the case of Proline, every ppq gene of Klebsiella, Pantoea, and Serratia and pqqB, pqqC, pqqD, and pqqE genes of Cronobacter and Kosakonia exhibit preferences toward A3 over U3. All genes of Kosakonia (except pqqE) and Pantoea are more biased to G3 over C3 in Valine.
The pqqC genes from some Gammaproteobacterial species present along the positive side of the major axis 1, such as P. agglomerans ASB05, P. agglomerans TH81, S. marcescens SER0094, Serratia sp. LS-1, Serratia ureilytica CC119, Serratia sp. NGAS9, and Serratia sp. SSNIH1, as well as Rouxiella badensis C173, P. agglomerans CFSAN047154, and Serratia sp. FS14 along the negative side of axis 1, prefer to remain close. The closeness of these genes reflects the similarities of their codon usages. Genes from different Gammaproteobacterial species clustered around the major axis 1 may indicate their selection over mutation bias. The pqqC genes from some species of K. pneumoniae, K. quasipneumoniae, and K. variicola, and three strains of R. aquatilis, such as R. aquatilis pKM05, R. aquatilis pKM12v2, and R. aquatilis pKM05, prefer to remain clustered, which indicates a comparatively greater role of selection pressure in codon bias for these genes. The pqqD genes from some Gammaproteobacterial species separated along the major axis 1, such as M. gaviniae DSM 22758, M. calida 5098PV, Cronobacter sakazakii 931, C. sakazakii SP291, and P. ananatis SGAir02. In a similar manner to pqqC genes, some pqqD genes from K. pneumoniae and K. quasipneumoniae prefer to remain close as well. Two bacterial species of Cronobacter, i.e., C. sakazakii CS-09 and C. sakazakii CFSAN068773, and C. sakazakii 931 and C. sakazakii SP291 remain close to each other. The pqqE genes from some Gammaproteobacterial species were not separated along the major axis 1. Genes were not clustered in a particular axis; accordingly, other orthogonal axes contribute to codon bias. Like pqqC genes, some pqqE genes from two Klebsiella species, K. quasipneumoniae and K. variicola, prefer to remain close. Four strains of Rahnella, such as R. aquatilis pKM12v2, R. aquatilis pKM05, and Rahnella sp. Y9602, were close to each other, and two species of Pantoea, such as P. ananatis NN08200 and P. ananatis PA13, were close to each other as well. K. aerogenes AR-0018, K. aerogenes HNHF1, and E. aerogenes KCTC2190 prefer to remain clustered. The closeness of these species reflected the similarities of their codon usages.
COA was performed based on the RSCU values of each gene to analyze the variation of codon usage of the genes (pqqA, pqqB, pqqC, pqqD, and pqqE) from selected Gammaproteobacteria involved in the PQQ biosynthetic pathway (Figure 6). Axis 1 (horizontal axis) represents the main index for affecting codon usage bias.

ENc Plot Analysis
The ENc is an important index to measure the codon usage bias within a genome and plays a crucial role in their codon usage profile. ENc or Nc was plotted against the GC3s for pqqB, pqqC, pqqD, and pqqE genes from Klebsiella, Kosakonia, Pantoea, Rahnella, and Serratia (Figure 7). pqqB, pqqC, pqqD, and pqqE genes from selected groups of microorganisms were located below the expected curve as per Figure 7. These results indicate that translational selection was involved in determining the selective constraints on codon bias in pqqB, pqqC, pqqD, and pqqE genes. Only a small number of pqqD genes from Gammaproteobacteria were slightly near the expected curve, and only one has been observed absolutely on the expectation curve. Microorganisms that are positioned on or close to the curve line were considered to be under mutational pressure. In the present study, the Nc values of the pqqC gene varied from 31.2 (M. gaviniae DSM and Serratia sp. FS14) to 46.73 (K. pneumoniae C16 and K. pneumoniae F16). The Nc values of the pqqD gene varied from 29.65 (Serratia sp. FS14) to 53.16 (Serratia Db11, Serratia SER00094, and Serratia UMH1). The Nc value of the Cronobacter 931 pqqD gene was 31.67. The Nc values of the pqqE gene varied from 32.84 (C. sakazakii SP2) to 48.33 (P. ananatis NN08200). For C. sakazakii CS, Nissabacter sp. SGAir0207, K. pneumoniae LH, and K. pneumoniae sub-pqqE gene, the respective Nc values were 33.6, 33.81, 34.8, and 34.87. The lowest value indicated a high degree of codon bias resulting from a low number of codons preferentially used for amino acids. The calculated Nc values of the rest of the genes from Gammaproteobacteria were observed to be >35, suggesting a weak codon bias.
Previous studies reported that few nif (nitrogen fixation) genes such as nifK, nifD, and nifH from Firmicutes, Euryarchaeota, Proteobacteria alpha and delta (Mondal et al., 2016) and rbcL (ribulose-1, 5-bisphosphate carboxylase oxygenase) gene in three prokaryotic families such as archaea, cyanobacteria, and proteobacteria (Mondal et al., 2013) were found under mutational pressure. ENc analysis suggested that mutational pressure and selection constraint were the main factors for shaping codon usage in Chlorophyll (Chl) synthesis and degradation pathway-associated genes in monocots and Red, A or C was preferred at the third position over U and G, respectively, for the codon, encoding the same amino acids; Green, U or G was preferred at the third position over A and C correspondingly. dicots (Begum and Mondal, 2020). The results of our ENc plot indicated that natural selection played a very important and dominant role in determining the selective constraints on codon bias in pqqB, pqqC, pqqD, and pqqE genes from Klebsiella, Kosakonia, Pantoea, Rahnella, and Serratia. Begum and Mondal (2020) also reported that the majority of the Chl synthesis and degradation pathway-associated genes in monocots and dicots preferred weak codon bias. Moreover, Rahman et al. (2018) also found relatively low codon usage bias and translation selection factor shaping codon usage pattern in Crimean-Congo hemorrhagic fever virus (CCHFV). Natural consortium of five acidophilic bacteria used for biomining preferentially has low codon usage bias (Hart et al., 2018). Likewise, in our study, Gammaproteobacteria pqqB, pqqC, and pqqE genes including a mass of pqqD genes from all selected groups of microorganisms preferred weak codon bias. A lack of strong codon usage bias is expected to promote efficient usage of more codons and thereby speed up the translation process (Jenkins and Holmes, 2003). It has been previously reported that stronger selection provides high translational accuracy to minimize the missense and nonsense errors (Stoletzki and Eyre-Walker, 2007;Hershberg and Petrov, 2008) and accelerates the translation elongation in protein expression (Ran et al., 2014), which is advantageous for genome stability in species evolution.

Identification of Optimal Codons
COA based on the RSCU values of the codons for every organism from pqq genes involved in the PQQ biosynthetic pathway exhibits optimal codons. Frequently used codons are known to be optimal codons, while less frequently used codons are nonoptimal codons. The codon usage bias ensures that the optimal codons can pair with the anticodons of the most abundant tRNA genes and avoids the misincorporation of amino acids to reduce processing errors. We have identified a total of 55 optimal codons (three-letter code for encoding amino acids) that encode 18 amino acids in pqqA-pqqE genes ( Table 4), among them, 6 are for pqqA, 17 for pqqB, 9 for both pqqC and pqqD, and 14 optimal codons for pqqE genes. Optimal codons ending with A/U and G/C in pqqA, pqqB, and pqqD signify that codon usage in selected organisms is biased to A/U-and G/C-ending synonymous codons. Contrastingly, in the pqqC gene, optimal codons end with A/U and C, suggesting that codon usage is biased to A/U and C-ending synonymous codons. Whereas, the pqqE gene possesses 13 optimal codons with G/C ending, and only 1 optimal codon ended with A/U (CGU). Optimal codons identified in our study might provide useful information for genetic engineering, gene prediction, and molecular evolution studies.

Gene Organization Study
The organization of the pqq genes from selected microorganisms in the PQQ operon was studied and deduced that the organization of pqqA-pqqE genes in PQQ operon from some Gammaproteobacteria was in the positive strand and some in the negative strand (Figure 8). Figure 8 revealed that the pqqD and pqqE genes from different strains of Cronobacter, Erwinia, Enterobacter, Klebsiella, Kosakonia, Mixta, Nissabacter, Pantoea, Rahnella, Rouxiella, and Serratia consistently overlapped. It has been observed that pqqB and pqqC genes from Enterobacteriaceae bacterium, K. aerogenes, K. pneumoniae, K. quasipneumoniae, Klebsiella quasivariicola, K. variicola, and P. dispersa were overlapped as well. Furthermore, there was a single character overlap in the pqqC gene from C. sakazakii, K. cowanii, M. calida, M. gaviniae, S. marcescens, Serratia sp., and S. ureilytica. PQQ biosynthetic pathway-associated genes have been extensively studied in several bacteria (Guo et al., 2009). They have reported that in R. aquatilis, pqqABCDEF might be regulated as an operon in which the pqqD and pqqE genes were fused. The relative order of pqqA to pqqF was identical for all four bacterial species such as Enterobacter intermedium 60-2G, K. pneumoniae NCTC418, A. calcoaceticus, and G. oxydans ATCC 9937, except that pqqF in P. fluorescens B16 existed in a different location (Guo et al., 2009). In P. fluorescens B16, a plant growth-promoting rhizobacterium, the genes pqqABCDEFHIJM are organized as an operon. However, genetic variation has been identified between different bacteria (Choi et al., 2008). Genetic organization of the pqqC gene in PQQ operon in Pseudomonas kilonensis JX22 was pqqF, pqqA, pqqB, pqqC, pqqD, pqqE, and pqqG, respectively (Xu et al., 2014). The seven genes have the same transcriptional orientation (Xu et al., 2014). The pqq operon gene order was pqqFABCDE in endophytic Pseudomonas (Oteino et al., 2015).

Study of Codon Preference Among the pqq Genes in the Positive and Negative Strand by Z-Test
Among 79 selected Gammaproteobacteria, 31 organisms were in the positive strand and 48 organisms were in the negative strand of the pqq genes. In the PQQ biosynthetic pathway, significant differences at p < 0.05 among the positive and negative strand of the pqq genes in their codon usage were observed according to the Z test (Supplementary Table 5). The pqq genes in the positive strand showed significantly high (p < 0.05) biasness in CUG (Leu), AAC (Asn), CCU (Pro), and GUG (Val). pqqA, pqqB, pqqC, and pqqD genes in the positive strand showed significantly high (p < 0.05) preferences toward AUU (Ile). pqqB, pqqC, pqqD, and pqqE genes in the positive strand showed significantly high (p < 0.05) biasness in CCU (Pro), CGU (Arg), ACC (Thr), and GUG (Val). pqqA, pqqB, pqqC, and pqqE genes in the positive strand showed significantly high (p < 0.05) preferences toward GGC (Gly) and CGC (Arg). pqqB, pqqC, and pqqE genes in the positive strand showed significantly high (p < 0.05) biasness in GCG (Ala), but pqqC gene in the positive strand also showed significantly high (p < 0.05) biasness in GCC (Ala). pqqC and pqqE genes in the positive strand showed significantly high (p < 0.05) biasness in AUC (Ile) and UAU (Tyr). pqqD and pqqE  genes in the positive strand showed significantly high (p < 0.05) biasness in AAG (Lys) and AGC (Ser).

Compositional Analysis of Amino Acid Frequencies of pqq Genes From Selected Gammaproteobacteria
There are 20 amino acids, but for a better understanding of their effect on different genes and organisms, the 20 amino acids had been classified into five groups based on their polarity and charge, i.e., Acidic polar (D and E), Basic polar (H, K, and R), Aromatic (F, Y, and W), Neutral non-polar (P, C, M, G, A, V, I, and L), and Neutral polar (Q, N, S, and T).
Average and standard deviation data of amino acid frequencies (from pqqA, pqqB, pqqC, pqqD, and pqqE geneencoded proteins) in Cronobacter, Klebsiella, Kosakonia, Pantoea, Rahnella, and Serratia are mentioned in Table 5. From Table 5 and Supplementary Table 6, it can be concluded that the frequency of acidic polar amino acid E (Glu) was comparatively high in pqqC, pqqD, and pqqE genes. The frequency of E was high in the pqqA gene of Serratia and Rahnella but quite low in the case of the pqqD gene of Cronobacter in respect to other Gammaproteobacteria. The aromatic amino acid W (Trp) was immensely less in frequency in each of the genes except pqqA and pqqC. Identically, the frequency of neutral non-polar amino acid M (Met) was also inadequate in all genes except pqqA. Furthermore, the frequency of neutral non-polar amino acid A (Ala) was high in pqqB, pqqC, pqqD, and pqqE genes and highest in the pqqD gene. Overall, the frequency of neutral non-polar amino acid C was very low (absent in pqqA), and the frequency of neutral polar S (Ser) was high in all pqq genes obtained from Gammaproteobacteria (except pqqD with lower S). Despite the frequency of neutral non-polar amino acid L (Leu) being miraculously high in every gene, the highest level of frequency has been found in the pqqA gene. In Klebsiella, the amount of the neutral polar amino acid T (Thr) was exceptionally low in comparison to other Gammaprotobacteria. Interestingly, in the pqqA gene, H (His), C (Cys), and Q (Gln) were absent. The elevated frequency of L (Leu) was biased for pqqA, pqqB, pqqC, pqqD, and pqqE genes. A (Ala) was also considerably biased for pqqB, pqqC, pqqD, and pqqE genes. Cysteine residues are capable of the formation of disulfide bonds, which plays an essential role in the stability and folding of the protein structure. The very low amounts of cysteine residues indicated that the protein gains its stability from other interactions as chances of disulfide bond formation are very low.

Estimation of GRAVY and AROMO
A higher GRAVY score of proteins represents the physicochemical properties of the integral membrane-bound characteristics, while a negative GRAVY score suggests the soluble nature of the protein (Kyte and Doolittle, 1982). In our study, the GRAVY scores of the protein sequences obtained from all the pqq genes involved in the PQQ biosynthetic pathway from selected Gammaproteobacteria except pqqC and pqqD gene of Serratia rendered a negative value, evincing the soluble nature (hydrophilic) of these proteins ( Table 6).
From the aromaticity score, we have found that the average AROMO value of the pqqA gene from selected Gammaproteobacteria was about 0.1, for pqqB, pqqC, pqqD, and pqqE genes that varied from 0.045 to 0.071, 0.066 to 0.095, 0.066 to 0.0092, and 0.056 to 0.106 in succession. From the amino acid frequency table (Table 5), it was observed that the average frequency of Phenylalanine (Phe) of the pqqB gene from Cronobacter, Kosakonia, Rahnella, and Serratia were 1.98 and for Klebsiella and Pantoea were 2.65 and 2.56, respectively. Similarly, the average Tyrosine (Tyr) frequency from Cronobacter, Kosakonia, Rahnella, and Serratia were 0.99. The overall frequency of Tryptophan (Trp) was 2.3 for pqqB genes. On the whole, the frequency of Phe, Tyr, and Trp was far better in other pqq genes than pqqB. In pqqD (5.1) and pqqE (4.64), the frequency of Phe seems much superior, whereas the average frequency of Trp of pqqE genes from all selected Gammaproteobacteria was low (1.57), and Phe frequency of pqqD from Rahnella was comparatively much higher (7.51). It was observed that Tyr frequency was extremely low in the pqqB gene from Cronobacter, Kosakonia, Rahnella, and Serratia in respect to other pqq genes.

Gene Adoptability Analysis
CAI is a ratio of the synonymous codon bias with a value that ranges from 0 to 1 of a gene to a highly expressed reference gene. An adequately higher CAI value indicates a strong bias of synonymous codon usage and a moderately high gene expression with potential gene adaptability (Sharp and Li, 1987;Gupta et al., 2004). To predict the probability of the degree of expression of the five pqq genes (pqqA, pqqB, pqqC, pqqD, and pqqE) among the 79 Gammaproteobacteria CAI was analyzed. In the case of pqqA, pqqB, and pqqE genes, Serratia holds the lowest values 0.336, 0.482, and 0.509, respectively, while the highest values were obtained from Pantoea, 0.652, 0.698, and 0.703, respectively ( Table 6). The CAI value of the pqqB gene from Rahnella was about 0.687, with an average value of 0.6 for others. Likewise, the lowest CAI values of the pqqC and pqqD genes were found in Serratia (0.533 and 0.532), whereas the highest values were occupied by Pantoea (0.708 and 0.711) and Rahnella (0.712 and 0.703). All other Gammaproteobacteria of pqqC and pqqD genes possess a CAI value of nearly 0.6 and 0.65, respectively. The CAI values in pqqC, pqqD, and pqqE genes from Pantoea and Rahnella were virtually alike. In contrast, the average CAI of the pqqB, pqqC, pqqD, and pqqE genes from different species of Gammaproteobacteria except for Serratia indicated a similar pattern with a comparatively high level of gene expression, reinforcing the verity of gene adaptability. The results demonstrate that gene expression level shaped codon usage in Klebsiella, Cronobacter, Kosakonia, Pantoea, and Rahnella, and the pqq genes with higher expression levels had a greater degree of codon usage bias and richer GC. Several studies on codon usage have revealed that natural selection on codon usage increases both translation accuracy and efficiency, which have long been known to affect gene sequences. Such selection is stronger on highly expressed genes, resulting in higher levels   of codon bias within genes with higher expression levels (Gouy and Gautier, 1982;Duret, 2000;Hershberg and Petrov, 2008). Selection on translation accuracy affects more strongly codons encoding conserved amino acids, since these will more often affect protein folding and/or function (Yannai et al., 2018). Yannai et al. (2018) demonstrated that selection on translation accuracy affects both lowly and highly expressed genes in E. coli.

CONCLUSIONS
In this article, an in silico approach was used to investigate the molecular evolution of the genes involved in the PQQ biosynthetic pathway among different species of Gammaproteobacteria. All the selected genes were present in an operon system maintaining pqqA-pqqE order either in the positive or in the negative strand. Phylogenetic analyses based on individual sequences of PqqA-PqqE proteins involved in the PQQ biosynthetic pathway depicted all the selected species getting clustered prominently according to their genera, such as Pantoea, Rouxiella, Rahnella, Kosakonia, Mixta, Cronobacter, and Serratia, suggesting their phylogenetic closeness. Moreover, all the phylogenetic trees, including 16S rRNA-based phylogeny, revealed a relatively close association of Klebsiella and Enterobacter; both of them belong to Enterobacteriaceae family. In this study, some of the sites within the MSA of those proteins that were not occupied by the identical character have been recognized to correlate the evolutionary relationship for a better understanding of the mutation prone sites within the sequences. GC1% and GC3% in pqqB, pqqC, pqqD, and pqqE genes were higher in the selected Gammaproteobacteria, revealing that these genes were GC-rich, whereas the average of GC1%, GC2%, and GC3% in the pqqA gene was comparatively low, revealing that the pqqA gene was AT-rich. The GC% at the third codon position was highest in all pqq genes of Cronobacter, which might be due to better adaptation to the environment. The content of GCs at different positions of the codons has also been properly reflected in RSCU values of the codons with few exceptions. Moreover, there is significant difference in their codon usage preference according to the position of the pqq genes; mostly G/C-ending codons have the preference for the genes in the positive strand and A/U-ending codons have the preference in the negative strand.
Phe for pqqD from Rahnella was higher than other Gammaproteobacteria. Tyr frequency was very low in the pqqB gene from Cronobacter, Kosakonia, Rahnella, and Serratia in respect to other pqq genes. The CAI of the pqqB, pqqC, pqqD, and pqqE genes from the selected species indicated high adaptability except for Serratia. The COA based on RSCU optimal codons in the pqqB gene ending with A/U are higher in number than those in the pqqE gene ending with G/C (except CGU). These optimal codons might provide significant information for gene expression prediction. The COAs suggested the mutational pressure on the pqq genes from selected organisms to shape the codon usage pattern; however, ENc and neutrality plot indicated natural selection. Besides, the pqq genes prefer weak codon bias. These systematic and comprehensive computational studies might be useful for the identification and characterization of the genes responsible for phosphate solubilization in other organisms.
To recognize the interface residues, the structural analysis of PQQ followed by molecular docking analysis with glucose dehydrogenase will be performed in the future. The variation of the interface residues and their efficiency for binding with glucose dehydrogenase as well as the production of gluconic acid will be quantified. PQQ, acquired from various genera of microorganisms with appropriate composition and better efficiency, might be competently recognized from the above methodical study.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Materials.

AUTHOR CONTRIBUTIONS
EB and SM conceived the study and designed and performed the experiment. YB, EB, RD, and SM analyzed the results. YB, RD, and EB wrote the manuscript. All the authors read, edited, and approved the manuscript.

ACKNOWLEDGMENTS
EB, RD, and SM are thankful to the Department of Biotechnology, the University of Burdwan, for the computational facilities. YB gratefully acknowledges Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, and also Centre of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-III), University of Calcutta, JD-2, Sector III, Salt Lake, Kolkata-700106, West Bengal, India, for infrastructural support.