Evolutionary Relationships Between Low Potential Ferredoxin and Flavodoxin Electron Carriers

Proteins from the ferredoxin (Fd) and flavodoxin (Fld) families function as low potential electrical transfer hubs in cells, at times mediating electron transfer between overlapping sets of oxidoreductases. To better understand protein electron carrier (PEC) use across the domains of life, we evaluated the distribution of genes encoding [4Fe-4S] Fd, [2Fe-2S] Fd, and Fld electron carriers in over 7,000 organisms. Our analysis targeted genes encoding small PEC genes encoding proteins having ≤200 residues. We find that the average number of small PEC genes per Archaea (~13), Bacteria (~8), and Eukarya (~3) genome varies, with some organisms containing as many as 54 total PEC genes. Organisms fall into three groups, including those lacking genes encoding low potential PECs (3%), specialists with a single PEC gene type (20%), and generalists that utilize multiple PEC types (77%). Mapping PEC gene usage onto an evolutionary tree highlights the prevalence of [4Fe-4S] Fds in ancient organisms that are deeply rooted, the expansion of [2Fe-2S] Fds with the advent of photosynthesis and a concomitant decrease in [4Fe-4S] Fds, and the expansion of Flds in organisms that inhabit low-iron host environments. Surprisingly, [4Fe-4S] Fds present a similar abundance in aerobes as [2Fe-2S] Fds. This bioinformatic study highlights understudied PECs whose structure, stability, and partner specificity should be further characterized.


INTRODUCTION
Redox-active cofactors are essential components of metabolism, functioning as molecules that transfer electrons at various reduction potentials, according to metabolic need. These pools of small molecules (e.g., NADH, NADPH, FADH, FMN, riboflavin, and quinones) can couple their reducing power to a wide range of oxidoreductases in parallel. For example, the quinone that functions in aerobic respiration within Escherichia coli interacts with two dozen oxidoreductases, while NADH/NADPH are used by over one hundred oxidoreductases in this microbe (Orth et al., 2011). What these small molecules lack, however, is the ability to evolve structures that discriminate partner binding and tune their midpoint potentials. In contrast, protein electron carriers (PECs) can tune both reduction potential and partner binding by modifying their amino acid sequences (Hosseinzadeh and Lu, 2016;Kim et al., 2016). This tunability is thought to enable the evolution of proteincontrolled, energy-conserving electron transfer (ET) pathways.
The Iron-Sulfur World Hypothesis, that life evolved within the cavities and capillaries of iron-sulfur enriched mounds, implicates ferredoxins (Fds) with [4Fe-4S] clusters as the earliest low potential PECs (Sousa et al., 2013). This idea is supported by the observations that these proteins represent the smallest PECs, having as few as 55 amino acids (Bertini et al., 1995), and the observation that iron and sulfur can readily combine to form iron-sulfur clusters under anaerobic conditions (Venkateswara Rao and Holm, 2004). With the advent of oxygenic photosynthesis and the Great Oxidation Event, the usage of O 2 -sensitive [4Fe-4S] Fds may have been disincentivized relative to the more O 2 -tolerant [2Fe-2S] Fds. Support for this idea comes from the observation that [4Fe-4S] clusters within canonical bacterial Fds have an exposed sulfido atom which can be attacked by O 2 , resulting in the release of iron and destruction of the cluster (Imlay, 2006;Jagannathan and Golbeck, 2009). With a hydrophobic patch covering their metallocluster, [2Fe-2S] Fds are more shielded from this degradation (Pierella Karlusich et al., 2014), while still capable of presenting a similar range of midpoint potentials (Atkinson et al., 2016). As a result,  Fds are thought to have emerged as the favored electron acceptor for the O 2 -evolving photosystem, proliferating in the new O 2 -rich world. While it is clear that some  Fds are more sensitive to O 2 than [2Fe-2S] Fds, the extent to which these two Fd types are used across different environmental niches has not been well established.
Gene duplication events during evolution have led to the growth and diversification of PECs (Onda et al., 2000;Terauchi et al., 2009;Peden et al., 2013;Cassier-Chauvat and Chauvat, 2014;Pierella Karlusich et al., 2014;Atkinson et al., 2016;Mellor et al., 2017;Burkhart et al., 2019). Biochemical and cellular studies of [2Fe-2S] Fd paralog specialization have been performed in a hyperthermophilic Archaeon, Thermococcus kodakarensis (Burkhart et al., 2019), and three model photosynthetic organisms: Synechocystic sp. PCC6803, Zea mays, and Chlamydomonas reinhardtii (Onda et al., 2000;Terauchi et al., 2009;Cassier-Chauvat and Chauvat, 2014). These studies revealed variations in the pattern of expression as well as differences in partner binding. However, we still lack basic information on how these and other low potential PECs evolved and specialized across the tree of life. While we have some limited information on the number of Flds and [4Fe-4S] Fds in phototrophs, beyond these organisms estimates of PEC distribution are scarce. We know that Flds often replace [2Fe-2S] Fds under iron limited conditions, both in free-living organisms and hostassociated organisms (Freigang et al., 2002;Goñi et al., 2008), but a quantitative description at the genomic level is lacking.
To gain a better understanding of how evolution has selected extant PECs, we report on the genome mining of 7,079 organisms for sequence motifs that are characteristic of three different low-potential PEC families, including the [4Fe-4S] Fds, [2Fe-2S] Fds, and Flds. We show that many organisms have large pools of small PEC genes, with 50% of our analyzed organisms possessing six or more small PEC genes, including members of all three PEC types. We find that PEC pools vary across genomes, with some organisms lacking small PEC genes and others having >50 total small PEC genes. We also report that PEC genes elongate in organisms having multiple PEC-encoding genes and under certain environmental conditions.

Genome Mining
We harvested 7079 genomes from the Joint Genome Institute (JGI) with the "finished" sequencing status and scanned them for genes matching Interpro sequence signatures for Fld/nitric oxide synthase (IPR008254), [2Fe-2S] Fd-cluster binding domain (IPR001041), and [4Fe-4S] Fd-cluster binding domain (IPR017896) (Finn et al., 2017). Interpro annotation was chosen because it synthesizes signatures from multiple databases with complementary but distinct annotation strategies, such as Pfam and PROSITE, which match for proteins on the basis of Hidden Markov Models and shared alignment profiles, respectively (Finn et al., 2017). Genes with over 200 amino acids were excluded from the dataset to focus the analysis on small PECs. Metadata for each genome was also downloaded and used for environmental niche analysis. Sequences obtained from genomemining were collected in a FASTA format and analyzed for Interpro (IPR) sequence signatures with InterProScan 5 (Jones et al., 2014). Supplementary Dataset 1 contains a list of organisms with the number of genes having each IPR signature and metadata about ecosystem characteristics. It also contains a list of the individual PECs collected after genome mining, with their sequences and the metadata of their host organisms.

Pairwise Analysis of PEC Abundance
The gene counts for each PEC type were plotted against one another in heatmaps generated using the Matplotlib python package (Hunter, 2007). A log-scaled color gradient was used to illustrate the number of organism with each PEC count combination. For ternary plot analysis, organisms were binned on the basis of their relative PEC pool composition. For each organism, counts of each PEC family were normalized such that they summed to 100%. All combinations of PEC pools were divided into bins with step sizes of 10%. If an organism's PEC composition placed it at the boundary of two or more bins, it was randomly placed in one of the adjoining bins. The density of organisms in each bin were visualized using a viridis color gradient. Results were plotted using the python-ternary package (Harper et al., 2015).

PEC Phylogeny Mapping
Evolutionary analysis was performed using a previously described evolutionary tree derived from concatenated protein sequences (Hug et al., 2016). The tree was pruned down to the 351 organisms present in our dataset using the Environment for Tree Exploration (ETE) 3 python toolkit (Huerta-Cepas et al., 2016). The IPR-matching PEC gene counts were placed at each organisms leaf in the phylogenetic tree in a stacked bar graph. Data was visualized using the Interactive Tree of Life webservice (Letunic and Bork, 2016).

Environmental Niche Analysis
Organisms were sorted on the basis of their JGI metadata labels. Organisms labeled thermotolerant were placed into the thermophile bins, while those labeled psychrotolerant and psychrotrophic were placed in the psychrophile bin. A heatmap was generated by plotting the average PEC gene count for organisms at each O 2 niche and growth temperature bin. Average values were rounded to two significant figures. Average PEC counts for external pH values were calculated by dividing PEC counts by the number of organisms observed at each pH.

PEC Length Analysis
To generate plots showing PEC length distributions and abundances, organisms were divided into different categories (PEC gene count, O 2 requirement, growth temperature), and the length distributions in each category were smoothed using kernel density estimation via the Matplotlib package (Hunter, 2007).

Statistics
The weighted pH average of each PEC type was calculated, along with the weighted standard deviation, and the three distributions were compared using a paired, two-tail t-test calculated using NumPy (van der Walt et al., 2011).

Genome-Mining Strategy
To understand how low-potential PEC usage varies in nature, we downloaded 7079 genomes with the sequencing status "Finished" from the Integrated Microbial Genomes and Microbiomes database (Chen et al., 2019). A majority of the genomes were from Bacteria (n = 6,733), although Archaea (n = 275) and Eukaryotes (n = 71) were represented. We scanned each genome for genes having Fld, [2Fe-2S] Fd, and [4Fe-4S] Fd Interpro sequence signatures (Finn et al., 2017). We excluded all gene matches encoding proteins above 200 amino acids. This size cutoff was chosen because it is greater than the largest [4Fe-4S] Fd (106 residues: 7FDR), [2Fe-2S] Fds (128 residues: 1E0Z), and Flds (184 residues: 2ARK) reported in the PDB (Schipke et al., 1999;Marg et al., 2005). By using a size cutoff that exceeds the length of PECs with single domain structures in the PDB, we sought to evaluate how the size of these three PEC families varies beyond the family members that have already been studied. We applied the same size cutoff to all three PEC types, even though they differ in average size within the PDB. Their average sizes are 77 ([4Fe-4S] Fds), 111 ([2Fe-2S] Fds), and 160 (Flds) residues. By using this approach, our analysis allowed the comparison of PEC evolution across the same size range. Additionally, the use of a 200 amino acid cutoff minimized Fld false positives, as IPR008254 also identifies nitric oxide synthases, enzymes that can be >400 amino acids (Fischmann et al., 1999;Fedorov et al., 2003). Because sequence additions frequently do not abolish the function of the ancestral domain (Björklund et al., 2005), we posited that many of the gene sequences detected with our approach encode proteins capable of ET like known Fds and Flds.

Organisms Differ in Their PEC Gene Counts
Organisms were initially sorted by taxonomic domain to visualize how gene abundances vary. Figure 1 shows that Archaea have the largest average number of PEC genes per organism (μ = 12.7 ± 5.7), heavily favoring genes with [4Fe-4S] Fd IPR signatures.This observation is in line with previous observations that archaeal metabolisms depend more heavily on non-heme iron-binding proteins than those of the other two domains (Andreini et al., 2007). Eukaryotes present the smallest number of PEC genes per organism (μ = 2.8 ± 5.3), utilizing genes with [2Fe-2S] Fd signatures to the greatest extent. Bacteria have an intermediate level of PEC genes (μ = 7.9 ± 6.2) and favor a more balanced composition. We next quantified the abundances of genes encoding each PEC type within each genome. Figure 2A shows that [4Fe-4S] Fd genes occur with the greatest abundances, with more than 400 organisms having ten or more [4Fe-4S] Fd genes. In contrast, there are only 162 organisms with 10 or more [2Fe-2S] Fd genes ( Figure 2B), and only one organism having more than 10 Fld genes ( Figure 2C). The total counts of genes encoding all three PECs were also summed together for each organism ( Figure 2D). This analysis reveals that 50% of all organisms analyzed have 6 or more PEC genes, and 3% of the organisms lack small PEC genes.
PECs can at times transfer electrons to overlapping sets of partner proteins, suggesting that some have evolved as interchangeable ET hubs. In some organisms, Fds are used for ET under iron-rich conditions, while Flds support ET among the same partner oxidoreductases under limiting iron conditions (Demuez et al., 2007;Goñi et al., 2008;Pierella Karlusich et al., 2014). To evaluate if the abundances of the different PEC gene types are proportional in some organisms or if there are differences in the relative abundances, we compared the pairwise counts of the different PEC types within each genome. All three pairwise comparisons presented a wide range of abundance combinations. In the case of the Fld and [2Fe-2S] Fd comparison ( Figure 3A), most organisms have <4 genes encoding each PEC type (64%). A small fraction of organisms (3%) contain ≥4 genes encoding both PEC types, and the remainder have ≥4 of one PEC type and <4 of the other type (33%).
The pairwise relationships between [4Fe-4S] Fds and Flds ( Figure 3B)  We next calculated the abundance of genes encoding all three PEC types in each genome and visualized this data using a ternary plot (Figure 4). The values obtained for each organism were distributed into bins based on the relative percentages of each gene type. With this analysis, ~20% of the organisms had only one type of PEC gene, a small subset lacked all three PEC types (3%), and the remainder of the organisms had two or three PEC types encoded in their genomes (77%). One of the most popular PEC gene pool makeups is near the middle of the ternary plot, representing organisms with 40-50% [4Fe-4S] Fd, 10-20% [2Fe-2S] Fd, and 30-40% Fld genes. This hotspot is attributed to the high number of proteobacteria in our data set, which dominate this makeup.
The ternary plot also reveals PEC combinations that are absent from the dataset organisms. For organisms that lack [4Fe-4S] Fd genes, only a subset of the possible [2Fe-2S] Fd and Fld gene mixtures are observed. Under this constraint, Fld and [2Fe-2S] Fd gene specialists are most common. This trend is consistent with the pairwise PEC comparisons that show organisms only possess large numbers of [2Fe-2S] Fd or Fld genes in cases where they also possess multiple [4Fe-4S] Fd genes. Taken together, these results indicate that genes encoding [4Fe-4S] Fds are widespread with a high degree of penetration in many genomes and provide further evidence that ancient organisms were likely dependent upon the [4Fe-4S] Fds family before the expansion of the other two families.

PEC Gene Evolution
To understand how PEC gene counts vary across extant organisms, we mapped our data onto an evolutionary tree (Hug et al., 2016). For visualization, we pruned the phylogenetic tree down to those organisms represented in our data set (n = 351). We show the abundance for all 351 organisms in a bar graph at each of the leaves on a tree ( Figure 5) and the average abundance in the major taxonomic groups (Table 1). From this analysis, it appears that evolution has selected for the expansion of homologs of all three PEC types more than once. The [4Fe-4S] Fds are most widespread in overall abundance, being dispersed through most organisms. However, this PEC type clearly spikes in average abundance (>10 paralogs) within the Archaea, Firmicutes, Deltaproteobacteria, and Betaproteobacteria. The [2Fe-2S] Fds present the greatest abundances within phototrophs (plants, Cyanobacteria), Alphaproteobacteria, and Betaproteobacteria. The Flds are most abundant within a subset of the Firmicutes (Bacilli), Gammaproteobacteria, and in parasitic organisms. Below, we describe the trends in greater detail for individual taxonomic groups that are well represented in our data set.

Eukaryota
In our data set, this kingdom has the smallest number of genomes, including algae, a protozoan, a plant, fungi, and Homo sapiens. These organisms exhibit highly variable PEC compositions, although they all possess [2Fe-2S] Fds. Corn (Zea mays) possesses the largest number of PECs of any organism on the tree (n = 42) and the fourth most PEC genes of any organism in the dataset, utilizing a mix of all three types. This observation is consistent with an ancient plant duplication event and a high rate of retention of duplicate genes in plant genomes (Panchy et al., 2016). With the exception of the green algae Ostreococcus lucimarinus, which contains 11 PEC genes, all other organisms in this domain possess small numbers of PEC genes. Homo sapiens uses two [2Fe-2S] Fds (Sheftel et al., 2010), while fungi have incorporated one [2Fe-2S] Fd and up to two Flds. Both human Fds are nuclearencoded and translocated to mitochondria (Sheftel et al., 2010). Additionally, the chloroplasts in phototrophs are hotbeds of PECs (Pierella Karlusich and Carrillo, 2017). Taken together, these observations support the idea that some organisms compartmentalize PECs within organelles that function in light harvesting and energy production (Sheftel et al., 2010;Scheibe and Dietz, 2012).

Archaea
This domain has the highest average number of PEC genes per genome. The [4Fe-4S] Fds are the most popular PEC type in most Archaea represented. The exception to this trend is the Haloarchaea, which predominately uses [2Fe-2S] Fds. This class is noted for having many members that live aerobically and in extreme saline conditions (Sorokin et al., 2017). This observation is interesting because Haloarchaea are thought to have acquired >1,000 genes from methanogens in a single gene transfer event, and because methanogens are generally anaerobic and dominated by [4Fe-4S] Fds. It seems unlikely that Haloarchaea received large numbers of [2Fe-2S] Fds from this gene transfer event, although they received other oxidoreductases from methanogens, such as pyruvate:Fd oxidoreductase (Nelson-Sathi et al., 2012). Haloarchaea possess [2Fe-2S] Fds with a high-degree of similarity to planttype Fds, with the exception of a ~30 amino acid addition near the N-terminus, which has been hypothesized to have entered the organism through gene transfer from cyanobacteria (Pfeifer et al., 1993;Marg et al., 2005;Nelson-Sathi et al., 2012). Surprisingly, a few archaea lack genes with Fd and Fld IPR signatures. Two organisms in the phylogenetic tree (Archaeoglobus fulgidus DSM 8774 and Palaeococcus pacificus DY20341) lack PEC genes, although other organisms from their respective classes that are present in the genome mining dataset contain PEC encoding genes (Supplementary Dataset 1).

Cyanobacteria
The PEC gene distribution in these organisms most closely matches the Haloarchaea, with averages of 8.6 [2Fe-2S] Fd, 4.5 [4Fe-4S] Fd, and 1.0 Fld genes per genome. This trend makes this one of the richest phyla in terms of consistent, uniform abundance across all three PEC types. This representation is thought to arise in part because Cyanobacteria express Flds instead of [2Fe-2S] Fds under iron-limiting conditions (Pierella Karlusich et al., 2014). Rather than contributing to management ET in chloroplasts, the [4Fe-4S] Fds in these organisms have been implicated in oxidative and metal stress response pathways (Cassier-Chauvat and Chauvat, 2014).

Chloroflexi
Some of these organisms have genes encoding all three PEC types like Cyanobacteria. However, this phylum has lower average PEC gene counts, and only a subset of Chloroflexi have Fld genes. This latter trend is thought to occur because the Chloroflexi in our tree have highly divergent life strategies, including the mesophilic anaerobic organohalide respirer Dehalococcoides mccartyi (Löffler et al., 2013), the aerobic predator Herpetosiphon aurantiacus (Kiss et al., 2011), and the aerobic thermophile Thermomicrobium roseum (Wu et al., 2009).

Actinobacteria
These Gram-positive bacteria vary in their PEC specialization, with some having 100% [4Fe-4S] Fd genes (e.g., Adlercreutzia equolifaciens) and others having 75% [2Fe-2S] Fd genes (e.g., Saccharopolyspora erythraea). Like Chloroflexi, only a subset of the genomes contain Fld genes. These organisms range widely in their total PEC gene counts, with some lacking PEC genes (Dermacoccus nishinomiyaensis) and others having 31 PEC genes (Pseudonocardia dioxanivorans). This wide variation appears to be the result of a split within the Actinobacteridae class, leaving one half PEC-rich and the other half PEC-scarce. Each half encompasses organisms with many life strategies. Two notable representatives of the PEC-rich organisms include the symbiotic nitrogen-fixing Frankiaceae and the antibioticproducing Mycobacteriaceae (Ventura et al., 2007). The PEC-poor half harbors at least six bacterial families (Kineosporiaceae, Dermacoccaceae, Promicromonosporaceae, Actinomycetaceae, Bifidobacteriaceae, Microbacteriaceae) which include many soil bacteria, the gamma-radiation resistant Kineococcus radiotolerans (Phillips et al., 2002), and the gastrointestinal-tract inhabiting Bifidobacteriaceae (Ventura et al., 2007). The clustering of PEC-poor and PEC-rich organisms suggests that there has been evolutionary pressure for these trends, but the underlying cause of that pressure is not known.

Firmicutes
This phylum is split into two major branches: Clostridia and Bacilli. The Clostridia mirrors Archaea in their high total numbers of PEC genes and high [4Fe-4S] Fd abundances. Bacilli, in contrast, contain low total numbers of PEC genes. While different PEC gene types are observed in Bacilli, Flds are most common. In our dataset, Bacilli are largely represented by members of the Lactobacillales order. There are two reasons why these lactic acid bacteria may utilize few Fds. First, most lactic acid bacteria colonize iron-poor environments and have evolved metabolisms that support growth without iron (Duhutrel et al., 2010). Second, many lactic acid bacteria produce H 2 O 2 in unusually large quantities as part of their metabolism (Imlay, 2019). The harmful Fenton chemistry that can arise from high H 2 O 2 and iron may have selected for these bacteria to evolve pathways dependent on Flds rather than iron-containing Fds.

Bacteroidetes
These microbes have diverse PEC gene pools, with total gene counts that vary between zero and twelve. They are unique in possessing organisms that are PEC specialists, with individual members that contain genes encoding only [4Fe-4S] Fds or Flds. The Fld specialists are confined to the Bacteroidales order, which includes the Fld-specialist Parabacteroides distasonis and the PEC-generalist Porphyromonas gingivalis. Both of these species are commensal and pathogenic bacteria associated with the oral cavity (Naito et al., 2008;Kverka et al., 2011). The [4Fe-4S] Fd specialists include an endosymbiont of amoebas with a reduced genome and severely limited biosynthetic capabilities (Candidatus Amoebophilus asiaticus), and an aerobic gliding soil bacterium that digests cellulose (Cytophaga hutchinsonii) (Penz et al., 2010;Zhu and McBride, 2017).

Delta and Epsilon Proteobacteria
These organisms encode [4Fe-4S] Fd genes at higher levels in their genomes compared with the other two PEC types. The sulfur-reducing Desulfobacula toluolica is a standout member that exemplifies this trend. This organism possesses twenty-two [4Fe-4S] Fd, five [2Fe-2S] Fd, and four Fld genes. One unusual order in the Deltaproteobacteria class is Myxococcales. These strict aerobic organotrophs metabolize macromolecules like cellulose, and despite having some of the largest bacterial genomes which are 9 to 10 million nucleotides in length, the two Myxococcales species in the pruned phylogenetic tree possess only five PEC genes (Dawid, 2000). In comparison, another myxobactera, Stigmatella aurantiaca DW4/3-1, which is not present in the phylogenetic tree, has seven PEC encoding genes (Supplementary Dataset 1). Additionally, this order stands out from the rest of the Delta and Epsilon Proteobacteria because it has predominately [2Fe-2S] Fd genes.

Alphaproteobacteria
Organisms within this class range from having 0 to 28 PEC genes, with a majority having genomes with at least one [4Fe-4S] Fd and one [2Fe-2S] Fd gene; only half have a Fld gene. Alphaproteobacteria with the highest PEC abundance are in the nitrogen-fixing Rhizobiales order (μ = 10.6 ± 7.1). In these organisms, the most abundant PEC genes encode [2Fe-2S] Fds. For instance, Bradyrhizobium japonicum has genes encoding nineteen [2Fe-2S] Fds, seven [4Fe-4S] Fds, and two Flds. The two-stage life cycle of Rhizobia, which includes bacteroid and free-living cells, may explain this abundance of PEC genes. In the bacteroid stage, Rhizobia derive reducing equivalents from host roots and, as they are metabolically active but not growing, must often shunt a large portion of those reducing equivalents into lipogenic pathways to maintain redox balance (Terpolilli et al., 2016). This coupling enables nitrogen assimilation, a process which requires Fds for transfer of low potential electrons from central metabolism to nitrogenase (Terpolilli et al., 2016). In contrast, free-living Rhizobia often gain a competitive advantage by metabolizing the secondary products excreted by a bacteroid of the same species, using oxidative pathways that are speciesspecific and require specialized PECs (Bahar et al., 2000).
One member of the Alphaproteobacteria (Ensifer adhaerens OV14) lacks PEC genes. As an opportunistic predator of the rhizosphere, E. adhaerens has recently generated interest as an alternative to Agrobacterium tumefaciens for the transformation of plants (Rudder et al., 2014). Often living in the nutrient-rich nodules of the rhizosphere, it is thought to depend upon the metabolisms of plants or other rhizosohere microbes to supplement the loss of some PEC-dependent pathways (Rudder et al., 2014).

Betaproteobacteria
These organisms also have a large variability in PEC gene numbers in each genome, with one organism having 29 PEC genes (Azoarcus sp. KH32C) and some containing only three PEC genes (Nitrosomonas europaea, Candidatus Profftella armatura, and Candidatus Kinetoplastibacterium oncopeltii TCC290E). One interesting order within this class, Nitrosomonadales, has genes encoding only [4Fe-4S] Fds. This observation can be contrasted with other Betaproteobacteria genomes, which frequently encode at least one [2Fe-2S] Fd and one Fld. Both of the organisms that are [4Fe-4S] specialists are notable wastewater bioremediators, Nitrosomonas europaea and Nitrospira multiformis, each of which oxidize ammonium to nitrite (Arp et al., 2002;Norton et al., 2008). How these organisms reduce  Fds is an open question. N. europaea grows autotrophically on NH 3 , CO 2 , and mineral salts alone (Arp et al., 2002), harvesting reducing power from NH 3 to produce NADH (Guo et al., 2013) which has a redox potential of−320mV (Huang et al., 2012), well above the potential of many [4Fe-4S] Fds which can range from−280 mV to −650 mV (Atkinson et al., 2016). Measuring the reduction potential of [4Fe-4S] Fds in nitrifying organisms may reveal that they are shifted more positively than canonical Fds.

Gammaproteobacteria
Organisms in this class are highly balanced in their usage of all three PEC types. This trend may arise because this class contains many pathogens. Endosymbiotic organisms often struggle to scavenge sufficient iron from the host environment, so the substitution of Flds for iron-demanding Fds may have occurred to support fitness under iron-limiting conditions (Litwin and Calderwood, 1993). The Enterobacteriales order, which contains pathogens such as Salmonella enterica and Yersinia pestis, stands out as being highly populated by all three PEC types, presenting a high percentage (up to 50%) of Flds.

Additional Phyla
Members of Planctomycetes, Verrucomicrobia, Chlamydia, Acidobacteria, and Nitrospirae are represented within the tree, but only in low numbers. In these organisms, [4Fe-4S] Fds are most abundant, although all three PEC types are observed. Additionally, our tree contains small numbers of organisms from the Fusobacteria, Deinococcus, Aquificae, Thermotogae, and Chlorobi phyla. In Fusobacteria, Flds genes are most abundant among the three PEC types, although there are members of this phylum with mixed PEC usage. Many Fusobacteria participate in polymicrobial infections of the respiratory tract and other anaerobic mucosal surfaces (Bennett and Eley, 1993). In these niches, iron availability is predicted to be limited, so Fusobacteria may use Fld genes as part of an evolutionary pressure to be iron frugal. Deinococcus, Aquificae, Thermotogae, and Chlorobi are more dominated by [4Fe-4S] Fds and have mixed PEC usage.

PEC Usage Varies With Environment Niche
Low potential PECs differ in their iron requirements and stabilities in the presence of atmospheric O 2 concentrations (Jagannathan et al., 2012;Pierella Karlusich et al., 2014;Holm and Lo, 2016). To determine if an organism's environmental niche correlates with PEC abundance, we evaluated how an organism's preferences for O 2 , external pH, and temperature relate to the average number of PEC genes found in a genome. Figure 6A shows that organisms with distinct O 2 requirements differ in their PEC usage. The [4Fe-4S] Fds, whose cofactors can be damaged by O 2 (Hsueh et al., 2013), represent the majority of PECs within anaerobes and obligate anaerobes.  (Tilley et al., 2001). Support for the last mechanism has come from successes in purifying Fds with [3Fe-4S] clusters that remain stable under aerobic conditions (Aono et al., 1989;Gomes et al., 1998;Ricagno et al., 2007). Additionally, some [4Fe-4S] clusters lose a single Fe atom when placed in an oxidizing atmosphere, rather than losing the entire cluster (Beinert et al., 1996;Tilley et al., 2001). These findings have led some to posit that Fds that bind one [3Fe-4S] cluster and one [4Fe-4S] cluster (i.e., 7Fe Fds) evolved from ancestral Fds that coordinate two [4Fe-4S] clusters to tolerate O 2 (Iwasaki et al., 1994;Tilley et al., 2001).
To gain insight into the role that an aerobic atmosphere had on the evolution of 7Fe Fds, we used InterProScan software to analyze the distribution of [3Fe-4S] ferredoxins. For this analysis, we examined the abundance of genes with the IPR signature for 7Fe ferredoxins (IPR000813). This analysis revealed that the [4Fe-4S] Fd genes in obligate aerobes and aerobes matched the 7Fe Fd signature 30 and 27% of the time, respectively (Supplementary Dataset 1). Facultative microbes yielded matches with only 7.8% of the [4Fe-4S] Fd genes, while anaerobes and obligate anaerobes yielded even lower percentages. These trends provide support for the idea that Fds with a [3Fe-4S] metallocluster are more stable under aerobic conditions. We next investigated how PEC usage relates to growth temperature ( Figure 6B). Genes encoding [4Fe-4S] Fds were most prevalent in thermophilic and hyperthermophilic organisms, with abundances of 80% and 88% respectively. These thermotolerant organisms also use [2Fe-2S] Fds and Flds, with the [2Fe-2S] Fds being >2-fold more prevalent than Flds. As optimal growth temperature decreases, the [4Fe-4S] Fds are partially replaced by [2Fe-2S] Fds and Flds.
To investigate how PEC gene abundance relates to both growth temperature and O 2 preference, we generated heat maps that compare the average number of PEC genes for different combination of conditions. Figure 7A shows that organisms with the greatest abundances of [4Fe-4S] Fd genes live at high temperatures in the absence of O 2 , and those with the lowest abundances live at lower temperatures in the presence of O 2 . Figure 7B illustrates how the organisms with the highest [2Fe-2S] Fd gene counts are mesophiles and obligate aerobes. This analysis also shows that [2Fe-2S] Fds present the lowest abundances at extreme temperatures in the absence of O 2 . Figure 7C shows that Fld genes are most abundant in psychrophiles that are facultative, and this comparison reveals that Flds present similar low abundance across thermotolerant microbes that live in the presence of different O 2 concentrations. The underlying cause of this trend is not known. However, the paucity of Flds at high temperatures could be caused by the accelerated degradation rates of free FMN cofactor at higher temperatures (Daniel and Cowan, 2000). Figure 7D shows the number of genomes used to analyze PEC abundance in each growth niche category.
Elevated pH can lead to the formation of iron hydroxides, which can decrease the concentration of accessible iron. To determine if organisms that grow in niches with high pH are enriched in PECs that use organic FMN rather than Fe-S clusters as cofactors, we compared PEC usage in organisms for which data was available on the pH of their ecological niche (n = 246). All three PEC types were found in organisms that grow optimally at the extreme ends of the pH scale (Figure 8). The [2Fe-2S] and [4Fe-4S] Fds are found in lower average external pH environments than Flds, although t-tests comparing the distributions found no significant difference in their mean values.

PEC Length Distributions
In organisms with multiple PEC paralogs, studies have revealed that homologs can specialize and evolve partner specificity that allows individual PECs to transfer electrons efficiently to a subset of oxidoreductase partners in an insulated manner (Terauchi et al., 2009). Biochemical studies have shown that this specificity can be achieved by altering the physicochemical properties of the PEC surface to tune binding rates and affinities for specific partners (Akashi et al., 1999), using allosteric conformational changes upon partner binding to regulate ET rates (Tyson et al., 1972;Schlesier et al., 2016), and increasing protein length to create protrusions that sterically hinder binding to some partners (Aoki et al., 1998). With the last mechanism, one would expect that organisms having a single PEC would exhibit smaller average sizes than those family members found in organisms having multiple PECs. To investigate this idea, we sorted PECs by the total numbers of PECencoding genes within each genome, and we plotted the distribution of lengths for each PEC type against the total number of PEC genes within genomes.
As the total number of PECs encoded in a genome increases, the average length of each PEC type increases, and the variance around each mode increases. In the case of [4Fe-4S] Fds, multiple distinct modes appear at total PEC abundances greater than two ( Figure 9A). These are centered around ~70, ~110, and ~170 residues. The relative abundance of each mode changes with total PEC abundance. At lower total PEC abundances (<10), the two smaller modes dominate and blur into a single continuous mode. As total PEC abundance increases from 10 to 20, the largest mode becomes dominant. At even higher PEC abundances, the lower two modes are most prevalent and appear as a single smear. The [4Fe-4S] Fds in the PDB have a distinct size distribution. Structurally-characterized [4Fe-4S] Fds have lengths ranging from 54 to 106 amino acids ( Figure 9B). The protein data bank lacks family members above this length regime, like those in our dataset.
For [2Fe-2S] Fds ( Figure 9C), a single mode is observed with organisms that have a single PEC, and two or three modes are observed in organisms having multiple PEC genes. These modes occur at ~90, ~Ί 10, and ~160 residues. The size dispersion of the data around the longer modes is greater than that observed with smallest mode.  Figure 9D). However, analysis of the size distribution of structurally-characterized [2Fe-2S] Fds reveals that we lack structural information on Fds from the largest mode (~150).
Flds ( Figure 9E) clearly present two modes as the number of PECs in an organism exceeds ten, which are centered around 145 and 170 residues. This observation suggests that previously described short and long-chain Flds represent two distinct subtypes that are widespread in nature (López-Llano et al., 2004). In cells with even higher PEC abundances, a third Fld size appears having a mode near 200 amino acids. Conducting BLAST searches on these proteins reveals high similarity to the WrbA family of flavoproteins, which are distinct from short-and long-chain flavodoxins in their capacity to conduct two-electron oxidation and reduction reactions which they utilize in their role as NAD(P)H:quinone oxidoreductases (Patridge and Ferry, 2006;Andrade et al., 2007). While it is clear that traditional Flds and previously studied WbrA proteins differ in their ET roles, we do not know if any natural variants exist that can promiscuously perform both functions. The similar structures of the two protein subfamilies suggests that WbrA homologs may be able to bind to natural Fld oxidoreductase partners, at least transiently. Future biochemical studies will be needed to test this idea. Indeed, analysis of the size distribution of structurally-characterized Flds reveals that only a handful (~10) of each subtype have been studied ( Figure 9F).

Oxidizing Conditions and PEC Size Distribution
To probe the relationship between O 2 growth requirements and PEC lengths within a genome, we binned organisms by ecological niche and evaluated the average size within the modes observed under each condition. With the [4Fe-4S] Fds ( Figure 10A), most organisms presented two major modes. The protein length for the larger mode was centered near 170 residues in all cases. However, the average size for the smaller mode increases with the transition from low O 2 (obligate anaerobes) to high O 2 (obligate aerobes) conditions. The [4Fe-4S] Fds in aerobes have a length distribution that is devoid of [4Fe-4S] Fds with lengths (≤60 residues) that are characteristic of the prototypical Clostridial Fds (Bertini et al., 1995;Atkinson et al., 2016). This finding suggests that only certain [4Fe-4S] Fds can support ET under the oxidizing conditions where these organisms grow optimally.
When examining [2Fe-2S] Fds length across different O 2 growth requirements, a distinct trend is observed from the [4Fe-4S] Fds ( Figure 10B). The lengths of [2Fe-2S] Fds on the anaerobic end of the spectrum are distributed around a single mode that is centered near 155 residues. In organisms that tolerate and require O 2 , additional modes appear, which are centered at ~80 and 105 amino acids. The underlying cause of this trend is not known. In aerobic organisms, the greater abundance of small [2Fe-2S] Fds may arise to support ET that is challenging for [4Fe-4S] Fds, due to their sensitivity to oxidation.
Interestingly, Flds become longer on average as O 2 growth requirements increase ( Figure  10C). The average Fld length is ~160 residues within obligate anaerobes and ~180 residues

NASA Author Manuscript
NASA Author Manuscript in obligate aerobes. In most cases, the Fld lengths are diffusely distributed around their means, with the exception of facultative organisms, which present multiple modes with small variances. There appears to be three Fld lengths that are highly popular in facultative species. Notably, of all the O 2 requirement groups, we found the most Flds in facultative species ( Figure 10C). These organisms are noted for being some of the most enthusiastic Fld adopters with an average percent incorporation of 32% ( Figure 6A). Lastly, it is worth noting that in aerobes and obligate aerobes, the Fld length distributions seems to butt up against the ceiling of the 200 amino acid cut-off. This observation suggests that this size cutoff may miss some Flds and that the true length averages for aerobes and obligate aerobes are higher than shown.

Optimal Growth Temperature and PEC Size Distribution
To determine how optimal growth temperature relates to PEC gene lengths, we binned PEC types by growth temperature and evaluated the average size under each condition ( Figure  11). Under thermophilic and hyperthermophillic conditions, genes encoding [2Fe-2S] Fds and Flds are longer on average than their mesophilic and psychrophilic counterparts. In contrast, there is marked shift toward smaller [4Fe-4S] Fds in thermotolerant organisms. This latter trend is consistent with the proposed emergence of [4Fe-4S] Fds before [2Fe-2S] Fds and Flds in ancient thermophilic archaea (Sousa et al., 2013).

DISCUSSION
Our bioinformatics analysis reveals that genes encoding low potential PECs are abundant within genomes from organisms across the tree of life. In the 56 phyla studied herein, we found 98% of them harbor at least one organism with a PEC gene. While kingdom level analysis revealed that the average number of PEC genes per organism decreases as one goes from Archaea (~13) to Bacteria (~8) and Eukarya (~3), a large amount of variation was observed within each kingdom. For example, one bacterium contains as many as 54 total small PEC-encoding genes (Desulfitobacterium hafniense DCB-2), 53 of which match for [4Fe-4S] PEC genes, and one eukaryote presented 42 PEC genes (Zea mays). As one considers the individual protein families analyzed, over 500 organisms presented 10 or more paralogs from either the [4Fe-4S] or [2Fe-2S] Fd families. In contrast only one organism had 10 or more Fld paralogs. The reason why iron-sulfur cluster containing PECs evolved larger numbers of paralogs within extant organisms is not known.
Although many lifeforms maintain an extensive PEC gene pool, suggesting a need for distinct physiological roles for each paralog (Onda et al., 2000;Terauchi et al., 2009;Peden et al., 2013;Cassier-Chauvat and Chauvat, 2014;Atkinson et al., 2016;Mellor et al., 2017;Burkhart et al., 2019), some organisms in our dataset lack annotated genes encoding small (<200 amino acids) PECs. These organisms span over a dozen phyla and all three domains of life. Given the phylogenetic diversity of organisms that lack annotated PEC genes, these organisms are unlikely to share a common metabolism. Further investigation will be needed to determine if these organisms evolved a more extensive use of NAD/NADP-dependent oxidoreductases and/or if they utilize oxidoreductases that arose from the fusion of genes encoding small PECs and their partner oxidoreductase. Such fusion proteins would not have been detected by our 200 amino acid cutoff.
Our size analysis revealed significant variation in PEC lengths. The average lengths of the [2Fe-2S] Fds, [4Fe-4S] Fds, and Flds were smallest in organisms containing a single PEC. This observation suggests that organisms with more than one PEC may require longer primary structures to support increased partner specificity, allowing organisms to discriminate which PEC is involved in an ET pathway (Onda et al., 2000;Terauchi et al., 2009;Duhutrel et al., 2010;Peden et al., 2013;Cassier-Chauvat and Chauvat, 2014;Atkinson et al., 2016;Burkhart et al., 2019). Our size analysis also revealed a large dispersion of gene lengths with multiple modes for each PEC type. For example, multiple modes were observed with the [4Fe-4S] Fds, which varied with the number of total small PECs encoded by genomes. While these modes occurred at different lengths, we observed family members with almost every possible size connecting these modes. This observation can be contrasted with the size distribution of structurally-characterized PECs. A narrower distribution of PEC sizes occurs in the structures within the PDB (Berman, 2000). This finding suggests that one way to obtain greater insight into PEC structural diversity would be to obtain structural data for PECs exhibiting a greater diversity of lengths.
To better understand the underlying reason for variation in PEC lengths, we evaluated how the primary structure of each PEC type changes with organismal growth requirements. We uncovered widespread variation in PEC structure and gene pool makeup that coincides with changes in an organism's O 2 requirement and tolerance. Previous research has found that Fd sequence extensions and partner protein binding can both enhance O 2 tolerance of [4Fe-4S] clusters through shielding (Jagannathan and Golbeck, 2009 Our bioinformatics analysis supports the idea that [4Fe-4S] Fds represent the most ancient low potential PEC family (Sousa et al., 2013). Organisms harboring [4Fe-4S] Fds were observed extensively across the tree of life, but they occurred with the greatest abundance in Archaea, which are deeply rooted (Sousa et al., 2013 Cyanobacteria, suggesting that these PECs arose prior to Flds in this phylum to support photosynthesis and diversified through duplication and mutation prior to these organisms evolving Flds. One avenue to unraveling this question may be to examine the PEC "fossil record" from a structural perspective. A recent study using the structural database to study oxidoreductase evolution observed a modular origin of biological ET chains (Raanan et al., 2018). An additional way to elucidate this question is to use protein design to test ideas about ancestral PECs that are no longer observed in nature (Mutter et al., 2019).
In a small number of organisms, proteomic studies have provided evidence that cells differentially control the flow of electrons across metabolic pathways by diversifying their PEC pool. Our finding that PECs are abundant in many genomes across the tree of life illustrates the need to understand the rules that guide PEC partner specificity. Structural studies have provided some insight into the molecular interactions that mediate PEC interactions with partner oxidoreductases, including structures of PEC-partner complexes (Morales et al., 2000;Kurisu et al., 2001;Müller et al., 2001;Dai et al., 2007;Xu et al., 2008;Strushkevich et al., 2011). However, these studies are limited to a small number of protein complexes. Even for the best-characterized PECs, we lack rules for anticipating partner specificities and predicting the electron fluxome. Our understanding of sequencestructure-electrochemical properties further limits our ability to anticipate PEC-mediated control over electron flow in cells. Relatively few PECs have had their midpoint reduction potentials measured (Atkinson et al., 2016), and strategies for characterizing the electrochemical properties of PECs are low throughput due to the need for protein overexpression and purification prior to electrochemical characterization. Unfortunately, algorithms for predicting midpoint potentials from primary structure are not yet sufficiently accurate and robust to predict PEC midpoint potentials without the need for in vitro characterization (Perrin and Ichiye, 2013).
In the future, high-throughput methods for comparing the ET efficiencies of PECs with defined partner proteins could help develop rules that relate PEC sequence to partner specificities. Cellular assays that couple biomass production to PEC-mediated ET in synthetic pathways have been reported and utilized to study both natural and synthetic PECs (Barstow et al., 2011;Atkinson et al., 2016Atkinson et al., , 2019. Such methods could be leveraged to analyze the partner specificities of any PEC imaginable, since genes are cheap to synthesize. We posit that the best PECs to analyze in such assays will be those having divergent sequences, which can be identified using sequence similarity networks (Brown and Babbitt, 2012). It may also be possible to uncover PECs with strong partner interactions by identifying operons that colocalize PECs with their partner oxidoreductases (Gerlt, 2017).
Further biochemical studies will be required to evaluate whether oxidoreductases colocalized with PECs exhibit greater specificity for one another compared with PECs encoded in more distal genomic regions. We hypothesize that the quickest way to obtain this information will be through high-throughput cellular assays that couple electron transfer between a PEC and its partner to cell growth (Atkinson et al., 2016). Large amounts of specificity data can be generated by expressing different PEC homologs in the presence of the same partner oxidoreductases and cataloging differences in growth that are observed with different PEC-partner combinations (Barstow et al., 2011;Atkinson et al., 2019). Since growth is proportional to electron transfer to a partner protein, the growth data obtained in such assays reflects the relative specificity of PECs for the same partner protein.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.       (Berman, 2000). The number of structures used to generate these plots are noted at the bottom of each panel.  The extrema are noted with horizontal bars at the edges, and the average is noted with an internal horizontal bar. The number of individual genes in each category is listed below the plots.