Respiratory Heme A-Containing Oxidases Originated in the Ancestors of Iron-Oxidizing Bacteria

Respiration is a major trait shaping the biology of many environments. Cytochrome oxidase containing heme A (COX) is a common terminal oxidase in aerobic bacteria and is the only one in mammalian mitochondria. The synthesis of heme A is catalyzed by heme A synthase (CtaA/Cox15), an enzyme that most likely coevolved with COX. The evolutionary origin of COX in bacteria has remained unknown. Using extensive sequence and phylogenetic analysis, we show that the ancestral type of heme A synthases is present in iron-oxidizing Proteobacteria such as Acidithiobacillus spp. These bacteria also contain a deep branching form of the major COX subunit (COX1) and an ancestral variant of CtaG, a protein that is specifically required for COX biogenesis. Our work thus suggests that the ancestors of extant iron-oxidizers were the first to evolve COX. Consistent with this conclusion, acidophilic iron-oxidizing prokaryotes lived on emerged land around the time for which there is the earliest geochemical evidence of aerobic respiration on earth. Hence, ecological niches of iron oxidation have apparently promoted the evolution of aerobic respiration.


Analysis of CtaA proteins
In the present work, we expanded genomic surveys to evaluate all proteins classified within the Cox15-CtaA super-family (https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi , accessed on 2 November 2020). We have scanned also recently deposited metagenome-assembled genomes (MAGs) from marine and aquatic environments to find new variants of CtaA that are present in genomes containing COX operons encoding family A oxidases, which were not considered in our previous analyses (Degli Esposti et al, 2020). After building a manually curated reference alignment encompassing all major types of CtaA proteins, we added any new candidate sequence obtained from BLAST searches, usually with E-values below 10 -3except for the most distant type 0 CtaA proteins that are not picked by conventional searches. Sequences that showed substitutions of more than one of the five invariant residues that are considered important for CtaA function, namely a glutamic acid (E57) and four histidine residues that may ligate hemes (Hederstedt, 2012;Niwa et al, 2018), were annotated (cf. Table 1) but then removed from the alignment used for reconstructing phylogenetic trees. Conversely, we maintained sequences showing a conservative substitution of only one of such residues in some alignments used to reconstruct phylogenetic trees, such as that in Supplementary Figure S3. However, the alignments that were refined for reconstructing the trees shown in Figures 3 and 4 contained only sequences exhibiting the mentioned five invariant residues.
In our analysis, we initially kept as a reference the first published phylogenetic tree of CtaA proteins (He et al, 2016), which showed the deep separation of type 1 and type 2 into two major clades without an apparent root, with the CtaA of Microcystis sp., a freshwater taxon of Cyanobacteria, apparently forming the deepest branch of type 1. After examining all the genomes of Cyanobacteria in current versions of National Center for Biotechnology Information (NCBI) repositories, we found that CtaA proteins of the Microcystis genus lie in the middle of a clade including ca. 800 full length proteins of Cyanobacteria that are currently available, the deepest branching of which is Gloeobacter CtaA ( Figure 3A and Supplementary Figure S2a), in agreement with the phylogeny of the phylum Cyanobacteria (Uyeda et al, 2006). Moreover, cyanobacterial CtaA sequences are nested within a large clade in sister position to the clade including type 1.1 CtaA of Proteobacteria ( Figure 3A and Supplementary Figures S2-S4).
Type 0 protein sequences are so divergent from other types of CtaA that they were (and still are) not easily recognized as members of the COX15-CtaA super-family. They were originally discovered as the product of a gene present at the end of the rus operon of certain iron-oxidizers ( Figure 1B), cf. (Appia-Ayme et al, 1999;Issotta et al, 2018;Quatrini etal, 2009). CtaA proteins of Acidithiobacillus spp., together with those of other proteobacterial acidophilic Fe 2+ -oxidizers such as Acidiferrobacter and Acidihalobacter spp., possess the invariant histidine and glutamate residues that are involved in heme binding, but lack the long ECL1 at the positive side of the membrane that is characteristic of type 2 CtaA Short CtaA proteins of 4TM with similarity to either half of, for example, B. subtilis CtaA are characteristically present in various lineages of archaea, for example in Aeropyrum pernix (Lewin and Hederstedt, 2006). Such proteins were omitted from further analysis because they segregated with different bacterial clades depending upon the taxonomic composition and method used for reconstructing phylogenetic trees (Supplementary Figure S1, cf. [Degli Esposti et al, 2020]). Moreover, these short CtaA proteins, now classified as type 1.4 (Table 1), showed long branches in phylogenetic trees (Supplementary Figure S1c), thereby producing systematic errors in their phylogenetic placement. As mentioned in the main text, we found a group of long-branch CtaA including rare proteins from Candidate Phyla Radiation (CPR), such as Ca. Margulisbacteria or Ca. Fraserbacteria, which have been picked up in Blast searches such as that shown in Supplementary Figure S6. These proteins do not represent defined CtaA subclades, since their position tend to vary in phylogenetic trees and belong to MAGs lacking complete COX operons. Moreover, the representative full length protein from Ca.
Margulisbacteria (accession MSR88539, from a lake metagenome) might not be functional, since it lacks the first conserved histidine ligand for heme binding.
Four clades of bacterial type 1 CtaA consistently showed long branches in phylogenetic trees obtained using different approaches. The CtaA of Thermus spp. is the N terminal part of a bifunctional protein that includes CtaB, which segregates with similar fused proteins from Armatimonadetes. These proteins have been ignored previously (He et al, 2016) and are classified as type 1.0 here because they lack the cysteine pair in ECL3 (Table 1). They constitute a long-branch subclade (Supplementary Figure S1c) that most frequently clusters with the clade of type 2 CtaA ( Figure 3A and Supplementary Figures S2 and S5). A second group of CtaA fused with CtaB encompasses proteins from Oligoflexia (family Bacteriovoraceae) and unclassified Proteobacteria, cumulatively labelled here as 'type 1.1 fused Oligoflexia' (Figures 3 and   4). These CtaA proteins show a wider separation between the two Cys residues in ECL3 than in most type 1.1 proteins ( Table 1). The third group of long-branch CtaA includes proteins from Ca.
Margulisbacteria and Ca. Fraserbacteria mentioned earlier. Finally, type 2 proteins exhibits a branch length that is usually larger than average (Supplementary Figure S1). Due to this feature, it is possible that the recurrent clustering of type 2 CtaA with the clade of fused type 1.0 from Thermus derives from LBA artifacts or other aberrations of phylogenetic trees. We addressed this issue by using mixture substitution models (e.g., [Le et al, 2010]) and found that both the clustering of the Thermus clade with the type 2 clade and that of the clade of type 1.1 fused Oligoflexia with the type 1.5 clade were maintained with strong support (Figure 4 and Supplementary Figure S4). However, clades of fused CtaA proteins always exhibited long branches, irrespective of the substitution model used for reconstructing the phylogenetic trees ( Figure 4 and data not shown).
Another approach that is traditionally used to reduce LBA artifacts is to remove the sequences that exhibit long branches, eventually substituting them with related sequences (either structurally or taxonomically) to limit the inevitable information loss (Bleidorn, 2017). Following this approach, we found that the removal of a single CtaA sequence or group of CtaA sequences exhibiting long branches did not significantly alter the overall tree topology. However, their removal generally reduced the support and topological reproducibility of internal nodes, especially in trees including 100 or more sequences encompassing all the taxa that have recognized CtaA proteins (Supplementary Figures S4 and S5a; see also [Degli Esposti et al, 2020]). Such taxonomically wide trees include proteins that show a substitution rate that is significantly higher than average, as previously found for other membrane-bound redox proteins (Khadka et al, 2018). This was the case, in particular, for two type 1 proteins form the genus Alicyclobacillus (acidophilic Firmicutes) that clearly differed from other CtaA of the same taxon (Supplementary Figure S5a and data not shown). Overall, however, the substitution rate was relatively similar in other proteins or sub-branches of CtaA trees, with the notable exception of Andalucia Cox15 (Supplementary Figures S3 and S4), presumably due to the notoriously fast rate of mutation of mitochondrial DNA. In any case, the basal position of the clade containing type 0 CtaA proteins was consistently found in phylogenetic trees obtained with different approaches, either in the presence or absence of the DUF420 outgroup (Figures 3 and 4,Supplementary Figures S3 and S4).
In the genome of strongly acidophilic Proteobacteria, genes for type 0 CtaA were found to be linked to COX genes on the chromosome. In other Proteobacteria that are not strongly acidophilic, and have diverse physiology, genes for type 0 CtaA are scattered on the chromosome. These taxa include the ironmetabolizing Metallibacterium, the mildly acidophilic and heterotrophic Acidimangrovimonas sediminis of the Rhodobacterales order of Alphaproteobacteria (Ren et al, 2019), the nitrite-oxidizing Nitrococcus mobilis and members of the Salinisphaera genus that have extremophilic character (Supplementary Figure S6). One gene for type 0 CtaA is present in a Solirubrobacterales MAG. Solirubrobacterales are a deep branching lineage of Actinobacteria found in soil environments (Hu et al, 2019)

Integrated approaches to resolve COX1 phylogeny
The evidence that type 0 CtaA may represents an ancestral form of heme A synthases would imply that the taxa that have the gene for this protein may also have deep branching forms of the major subunits of the COX enzyme, following the logical principle of co-evolution. This principle formed part of our methodological approaches permeated of reductionism (Bridandt and Love, 2007) to simplify the complex biological problem of finding the possible origin of COX. Focusing on the phylogeny of the major subunit COX1, we started with the A1 type present in thermoacidophilic archaea that also have type 0 CtaA (Supplementary Figure S6). Initially, the selection of these proteins was made following BlastP searches of Halobacterium COX1 against the nr database and represented the major lineages of aerobic archaea having family A oxidases (Blank, 2009;Sousa et al, 2012). Complementary BlastP searches were undertaken with COX1 proteins from other archaea such as Ca. Marsarchaeota. A large clade of proteins from marine Euryarchaeota MAG was omitted after finding that it clustered very closely to the COX1 of Betaproteobacteria (in the tree of Figure 5 they would thus cluster with the Rhodobacteraceae proteins). The full sequences were aligned with only minimal trimming of the N terminal part (generally restricted to a few residues for the longest proteins), first with the MUSCLE algorithm within the MEGA program and then refined manually to match the known structure of P.
denitrificans COX as described in Materials and Methods. Once the alignment of 30 full sequences of COX1 had been refined, we applied both Bayesian and ML inference to reconstruct robust phylogenetic trees, on which we then annotated the distribution of CtaA types and other features that appear to be ancestral in the molecular evolution of COX1 (Degli Esposti, 2020), as shown in Figure 5.
After establishing that archaea may be excluded from the ancestry of COX (see main text), we focused our work on the possible resolution of the phylogeny of bacterial COX1 of the A family. This task initially appeared to be next to impossible, given the complexity of the molecular evolution of COX subunits that emerged from multiple previous studies (Degli Esposti et al, 2019;Ducluzeau et al, 2014;Golyshina et al, 2016;Han et al, 2011;Hemp et al, 2008;Pereira et al, 2001;Spang et al, 2019).
However, the task was simplified by the exclusion of archaeal COX proteins from the likely ancestry of the COX enzyme (Fig. 5). We also excluded B family oxidases from further analysis for the reasons presented in the main text. We applied diverse approaches of sequence analysis to complement phylogenetic studies, realizing that phylogenetic trees alone can hardly resolve COX evolutionary patterns over long times such as that spanning the GOE (Degli Esposti, 2020). Our analysis was extended to all currently available bacterial genomes and produced a wealth of information that is summarized in Supplementary Table S1. The information is subdivided among the ca. 30 subclades of prokaryotic COX1 that have been consistently found using diverse approaches of phylogenetic inference (see Materials and Methods for details) and often coincide with recognized operons (Supplementary Table S1, cf. [Degli Esposti, 2020]). The definition of such subclades subsequently guided the analysis of the phylogenetic trees reconstructed with bacterial COX1, and also with bacterial COX2, providing a simple consistent labeling for the branches containing proteins from different lineages sharing similar molecular features.
Given the broad taxonomic distribution of COX among diverse bacterial lineages, the choice of taxonomic sampling has paramount importance for reconstructing trees of COX sequences. We tackled this issue by carrying out a systematic search of COX1 variants to define the subclades and evaluate their main properties of branch length and substitution rate in phylogenetic trees (Supplementary Table S1).
On the basis of such information, we subsequently excluded COX1-3 fusion proteins exhibiting long branches ( Fig. 6 and Supplementary Figure S8). Late diverging COX1 proteins belonging to split gene clusters such as that of Mycobacterium spp. were also excluded for equivalent reasons. Next, we excluded the clade of ubiquinol oxidases corresponding to the cytochrome bo3 subtype of A1 type COX (Degli Esposti et al, 2019) because their COX1 protein displayed a substitution rate that was much higher than that of other family A oxidases (Supplementary Figure S9B). We progressively narrowed the taxonomic sampling to a set of COX1 proteins that exhibited close to average values of either branch length or branch substitution rate, which in principle would reduce the potential problems related to LBA.
Moreover, we verified that it was not necessary to use many members of the same subclade to provide a consistent placement of its branch in phylogenetic trees; in many cases, two members with significantly divergent sequence could define such a phylogenetic position. We thus settled on sets of 70 to 80 COX1 protein sequences, including a handful of FixN paralogs as outgroup, to conduct a thorough examination of the reproducibility of COX1 trees.
A crucial portion of the information in Supplementary Table S1 is derived from a wide survey of the molecular variation in the residues that form the proton-conducting channels in family A oxidases (Supplementary Figure S7), and expand the data in a previous study on the topic (Degli Esposti, 2020).
Bacterial family A oxidases contain two proton-conducting channels that are fundamental for efficient energy conservation coupled to the reduction of oxygen in the catalytic cycle (Han et al, 2011;Pereira et al, 2001;Sharma and Wikström, 2014;Iwata et al, 1995). While the D-channel is apparently present only in the family A, the K-channel is common to family A, B, and C of the HCO superfamily. In family C, this channel is contributed by two sets of protonable amino acid residues, one specific to the family and one in common with the family A (Degli Esposti, 2020), and has been frequently labeled K C -channel (Ahn et al, 2018). Moreover, protonable amino acid residues are present in equivalent positions of those specific to the K C -channel in several COX1 proteins of family A oxidases, but not B family oxidases, in a pattern supporting the hypothesis that these oxidases gradually evolved from family C oxidases (Castresana and Saraste, 1995). See reference (Degli Esposti, 2020) for a recent review.
The amino acid residues of the K-channel reflect the functional evolution of COX without clear instances of co-evolution in residues that are spatially adjacent to each other (Wang and Pollock, 2007). Such residues are often located in different TM of the protein (Ahn et al, 2018;Iwata et al, 1995;Pereira et al, 2001;Svensson-Ek et al, 2002)  We systematically analyzed also the presence of additional TM at the N terminus of COX1 (which normally has 12 TM), abbreviated as 'TM extra' in Supplementary Table S1. This molecular feature appears to have ancestral character in COX1 proteins, since it is shared with several family C paralogs (Degli Esposti, 2020) and consequently can contribute information for disentangling COX phylogeny.
The most common feature of 2 additional TM at the N-terminus, abbreviated as '2TM extra', is mentioned in the main text with regard to the molecular phylogeny of archaeal COX, since it is present in COX1-3 fusion proteins of Ca. Marsarchaeota ( Figure 5). In phylogenetic trees of bacterial COX1, the presence of the '2TM extra' feature provided a new way to systematically label proteins that are usually classified as type A2, even if they have the molecular signature PEVY in TM6 that is normally associated with type A1 oxidases (Pereira et al, 2001;Sousa et al, 2012). The situation often applied to COX1 proteins from acidophilic Proteobacteria such as Acidithiobacillus spp. and acidophilic Firmicutes such as Sulfobacillus spp., as well as to a variety of deep branching COX1 from soil or aquatic environments (Degli Esposti, 2020). Hence, phylogenetic trees have been predominantly constructed by including full length COX1 sequences, thus maintaining the N-terminal part that was routinely trimmed in previous studies ( LGT), or earlier in the common ancestor of both major clades. In the latter scenario, the ancestral form of the 'canonical' K-channel would have been lost at least six times by different modifications along COX evolution, even multiple times within individual subclades (particularly that of CyoCAB at the bottom of the tree). The former scenario would be more conservative regarding the evolution of the 'canonical' Kchannel because it could explain its presence in subclades of type A2, for example the subtype a-I, by LGT events to their common ancestor. However, the same scenario would hardly explain the presence of K C -channel variants in three subclades of the type A1, for instance in that of Elusimicrobia ( Supplementary Figures 7 and 9A).
Conversely, mapping the various categories of the K-channel onto the ML tree (Supplementary Figure   S9C) suggested a discrete evolution of the 'canonical' K-channel from an ancestral K C -channel (or hybrid) shared with the common ancestor of family C paralogs at the node indicated by the red arrow in Figure S9C. The common ancestor represented by this node would have possessed a hybrid form of the K-channel that could mutate into either a 'canonical' K-channel or a K C -channel-like by a few amino acid changes, consistent with the variation in channel-forming protonable amino acid residues (Supplementary Figure S7). In other words, the plasticity of the COX1 protein (Ducluzeau et al, 2014) and the integrated evolution of its functionally related residues that are involved in proton pumping (Degli Esposti, 2020) may have led to sequential variants of the K-channel along COX evolution, without involving multiple events of LGT across taxonomically separate lineages of bacteria. The topology of the ML tree in Supplementary Figure S9C, therefore, looks more plausible than the topology displayed by the tree in Supplementary Figure S9A.
Mapping the presence of the 2TM extra feature on phylogenetic trees provided further support to the above conclusion, since it implied a single loss rather than two or more losses along COX evolution (Supplementary Figure S9C). However, trees such as that in Supplementary Figure S9C could hardly represent the most likely phylogeny of COX1 proteins, since they include branches with undesired features leading to artifacts, as mentioned in connection to the subclades of Solirubrobacterales. A second approach to alignment building and refinement was thus necessary for better resolving the phylogeny of bacterial COX1.

A second approach to resolve COX1 phylogeny
Usually, the alignment of multiple COX1 protein sequences for reproducing phylogenetic trees were built by computer programs first (see Materials and Methods), which invariably introduce many gaps to maximize local sequence similarity. Such gaps were reduced by subsequent manual refinements (Degli Esposti et al, 2019;Ducluzeau et al, 2014;Hemp and Gennis, 2008;Sousa et al, 2012). In contrast, our new approach started by aligning ca. 20 COX1 sequences, mostly possessing the 2TM extra feature, with a few FixN paralogs, which introduced a reduced set of gaps to match local sequence similarity after manual refinement of an initial alignment obtained with the MUSCLE program (Supplementary Figure   S10). The close reproducibility of phylogenetic trees obtained with different programs was verified before adding other COX1 sequences to the alignment (Supplementary Figure S10). Such additional sequences were added up to a total of about 50 without inserting new gaps. Inserts that would be required to maximize the match of additional sequences were not considered in expanding the alignment. These alignments provided maximal representation of the recognized subclades of bacterial COX1, after excluding those exhibiting clear deviations from average values of branch length or substitution rate.
Subsequently, detailed analysis of Bayesian BEAST trees obtained with these COX1 sets unveiled the presence of subclades with branches significantly longer than average, as shown in Supplementary Figure  S11B. The proteins of such branches were replaced and the analysis of branch length was undertaken again, revealing that COX1 of type A1 subtype a-III exhibited a branch that was significantly longer than average, thereby potentially reducing the strength in posterior support of an internal node of the branch containing type A1 COX1 (Supplementary Figure S12). The substitution of just one of the subtype a-III sequences that were originally present with that of another A1 subtype was sufficient to strengthen the support of this internal node and the overall stability of the Bayesian tree of 40 COX1 proteins (Figure 7, cf. Supplementary Figure S12). Thus, trees such as that shown Figure 7A could represent a reproducible robust phylogeny for bacterial COX1 exhibiting maximal overall support.

Analysis of CtaG proteins: caa3_CtaG
Phylogenetic trees of diverse caa3_CtaG proteins, the distribution of which is reported in Supplementary  Larger alignments than that used for generating the tree in Figure 8B were progressively purified of long branches (Supplementary Figures S13B and S14A) on the basis of the 95% confidence range of branch length values determined by the BEAST program, as described before for CtaA and COX1. Additionally, the weak posterior support for a branch in major clade 1 (orange circle in Supplementary Figure S14A) was enhanced by replacing several caa3_CtaG sequences that still exhibited long branches with those that showed close to average branch length and clustered in the same clade, thereby obtaining acceptable values of support: above 76% in the Bayesian tree ( Figure 8B), and over 80%Ultrafast bootstraps in the ML tree for the same alignment of caa3_CtaG proteins (Supplementary Figure S14B). This enhanced support for clade 1 was obtained without significant change in the overall topology of the phylogenetic trees, thereby indicating good reproducibility in evaluating the molecular phylogeny of caa3_CtaG proteins, despite the low number of amino acid sites and poor sequence identity (only one residue was invariant in the alignments used here, as mentioned in the main text).

Analysis of CtaG proteins: CtaG_Cox11
In the main text we have focused on the caa3_CtaG family of Cu chaperons that are required for the assembly of CuB in nascent COX1, because such proteins are present in iron oxidizers and other taxa that possess deep branching COX (Supplementary Tables S1 and S2). Here we present also the results of our phylogenetic analysis of the other family of CtaG, which is homologous to mitochondrial Cox11 and fulfills the above function in eukaryotes (Banci et al, 2004;Degli Esposti et al. 2019). This protein is part of the CtaG_Cox11 family that originated in Proteobacteria and is predominantly present in alphaproteobacteria as part of the A1 type COX operon subtype ab, generally in sinteny with COX3  Table S2) that provide a reasonable outgroup to root the phylogenetic trees, which were extended to COX11 proteins from diverse supergroups of eukaryotes (Burki et al, 2020) that are currently available in the nr database ( Supplementary Fig. S15). No homologs could be found in Archamoebae, though. Cox11 proteins from Archaeplastida (red and green algae) appear to form the deepest branches in the monophyletic clade of eukaryotic Cox11 (Supplementary Fig.   S15). Intriguingly, the bacterial sister clade to this eukaryotic clade consistently contains CtaG_Cox11 proteins from Holosporales, including Alphaproteobacteria bacterium 41_28 ( Supplementary Fig. S15 and data not shown -see also Supplementary Table S2) S15 and data not shown). Therefore, it is currently difficult to define probable alphaproteobacterial ancestors for mitochondrial COX11 on the basis of the available proteins and genomes.

Statistical analysis of phylogenetic trees
Phylogenetic analysis of COX1 included quantitative evaluation of the length and substitution rate of each branch as performed for CtaA (cf. Supplementary Figures S5 and S8b). We realized that the pervasive presence of long branches in COX1 trees demanded a thorough statistical analysis of the parameter of branch length to limit the artifacts arising from LBA and unequal rates of substitution in the phylogenetic trees. Rather than using the common 'inspection' of long branches in preliminary phylogenetic trees, for example (Spang et al, 2019), we undertook quantitative analysis of the length and substitution rate of tree branches following two approaches. In the case of ML trees, these branch parameters could be obtained as raw values using the FigTree program and were therefore compared across several trees obtained with the same program and alignments of similar size (Fig. 6). The statistical significance of deviations from the average were determined using the non-parametric test of Mann-Whitney with 95% confidence using the MiniTab19 program https://www.minitab.com/enus/products/minitab/ ; p values <0.05 were considered statistically significant. An equivalent analysis was undertaken with the median values of either branch length or branch substitution rate that were obtained from different BEAST Bayesian trees. Methylacidiphilium COX1 was not considered further in our phylogenetic analyses. Conversely, the COX1 proteins of Cyanobacteria generally displayed a median branch substitution rate that was significantly lower than average, in agreement with previous reports indicating a slow rate of substitution in membrane redox proteins of Cyanobacteria (Cardona et al, 2019; and references therein). Despite this slow substitution rate, removing cyanobacterial COX1 had little effect on the topology of phylogenetic trees. Therefore, representative cyanobacterial COX1 proteins were generally maintained in our analyses. Table and Figures   Table S1. Major subclades of COX1 proteins in prokaryotes. The table is provided also as an Excel file. Table S2. Distribution of CtaG proteins in bacterial subclades and eukaryotic supergroups. Figure S1. CtaA phylogenic analysis presented prior to this work.

Supplementary Table S1. Major subclades of COX1 proteins in prokaryotes.
Supplementary Table S1.xls KC, K C channel typical of C family oxidases, but present also in some family A oxidases ( Figure S7).    Table 1) and are presumed to be functional since no other heme A synthase is present in their genomes containing family A COX. The tree has been used as the basis for the model of CtaA molecular evolution in Figure 3B. Posterior values of branch support are in decimals, and are not as strong as in other Bayesian trees (e.g. Figure 3) because of the very limited number of invariant residues (Degli Esposti et al, 2019) and the large sequence variation exhibited by deduced CtaA sequences from MAGs. The alignment included 355 amino acid sites. Andalucia Cox15 clusters in a clade containing various type 1.1 proteins of marine alphaproteobacteria including those of the TMED109 order (see Supplementary Table S2), confirming previous results (Degli Esposti et al, 2020).

Fig. S3 new
CtaA95MCCFigTreePT 3 Dec, cf. model for Fig. 3b Supplementary Figure S4. Branch variation in phylogenetic trees of 100 bacterial CtaA proteins. Panel a. The IQ-Tree consensus ML tree was obtained with the mixture substitution model C20 (He et al, 2016) to improve the robustness of phylogenetic trees obtained with different types of CtaA proteins for a total of . Panel b. The IQ-Tree consensus ML tree was obtained with the same WAG model used for the Bayesian tree in Supplementary Figure S3. The trees in both panels were reconstructed from an alignment of 99 bacterial protein plus the Cox15 of Andalucia mitochondria that was slightly expanded form that used in Supplementary Figure  Panel a. BEAST MCC and full set of trees obtained with an alignment of 100 CtaA protein sequences. The clade containing proteins of Alicyclobacillus that have a significant higher rate of variation is highlighted with a red box. The total number of amino acid sites in the alignment was 363. The mirror image of the cumulative view of the same Bayesian trees obtained with the Densitree program is shown on the right. Panel b. Quantitative evaluation of median branch length for selected subclades or (sub)types of CtaA from four separate BEAST MCC trees including that in a, which were built from the alignment of 52 to 100 CtaA proteins (cf. Supplementary Figure  S4a). Error bars indicate standard deviation, S.D. The asterisks indicate statistically significant deviations from the values of the subclades on the left (p = 0.03, Mann-Whitney test). The NJ tree was obtained from a PSI-BLAST search using A. ferrooxidans ACK78561 type 0 CtaA as a query against the version of the nr database (https://blast.ncbi.nlm.nih.gov/Blast.cgi ) accessed on 5 September 2020. The search retrieved 118 hits for significant protein orthologs whether it was extended to 250 or more taxa; 28 of these hits were removed because they were either duplicate or partial proteins. Taxa of thermoacidophilic archaea are in light brown as in Figure 5, while acidophilic Proteobacteria are in dark blue; other taxa of Rhodobacterales and Gammaproteobacteria are in red and marine blue, respectively. The outgroup is represented by HHM71647, a type 1.0 CtaA from Ca. Fraserbacteria MAG from a hot spring metagenome that was picked by the search with E value worse than threshold, but showing 27% identity with the query. The tree topology is very similar to that obtained earlier from a comparable Blast search (Degli Esposti et al, 2020 A2 subtype a-I Nannocystis Myxobacteria A2 subtype a-I  Figure S7. Panel b. Amino acid variation of the K-channel in archaeal COX1 and COX2.
The original excel file from which this figure has been built is supplied as: Supplementary  Figure S8. Extended phylogenic tree of COX1 from bacterial and archaeal taxa.
Panel a. The ML tree was obtained with the program IQ-Tree using an alignment of 100 COX1 protein sequences from family A and B oxidases that covers the majority of the subclades analyzed here (Supplementary Table S1 see also Fig. 6). The mixture model of amino acid substitution EX_EHO (Le et al, 2010) was used for a total of 600 amino acid sites. Trees with essentially the same topology were obtained using the LG model (Fig. 6), but with less robust support in some internal nodes, expressed in %Ultrafast bootstraps as indicated. The few values ≤ 85% are shown, with the single one below 50% in red. The proteins were selected from diverse taxa as to represent the majority of COX operons that are currently recognized in bacteria lineages (Supplementary Table S1, cf. Degli Esposti, 2020) and were aligned with a combination of computer-assisted and manual refinement to match the known structure of P. denitrificans COX, as previously described (Pereira et al, 2001: Degli Esposti et al, 2020. The alignment, available upon request, included four FoxA paralogs from iron-oxidizing Sulfolobaceae (Kozubal et al., 2011) and was trimmed at the N terminus as in previous studies (  Panel a. Bayesian tree with annotated ancestral features. The Bayesian BEAST MCC tree was obtained with an alignment of 74 bacterial COX1 sequences and 6 FixN sequences, with a total of 682 amino acid sites. The distribution of variants of the K-channel and extra TM at the N terminus is annotated as in Figure 5. The values of posterior support was generally 100% except for the four branches with annotated numbers. Note the weak support (47%, in red as in Supplementary Figure S1) for the bottom clade combining A2 type with 2TM proteins. Panel b.
Median rate of substitution of selected subclades of A1 type COX1 was calculated from the values obtained from n = 5 separated Bayesian BEAST MCC trees as in Panel a, which included different combination of 64 to 90 COX1 proteins. The rate value for the bo3 ubiquinol oxidase is at least 3-fold larger than that of other clades (p = 0.012, Mann-Whitney test).  Figure S9 panel c. The representative ML tree was obtained with the program PhyML 3.0 using an alignment of 78 bacterial COX1 and FixN protein sequences that included their N terminus as in Figure 7, with a total of 682 amino acid sites. SH-like support is shown only for the nodes displaying less than 90% values. The distribution of the 2TM extra feature and of the different variants of the K-channel classified in Supplementary Figure S7 are annotated onto the tree branches using the symbols shown at the bottom of the figure, as in Figure 5. No significant change in the composition of the D-channel was observed among the proteins analyzed here.

Symbols:
2TM extra at N terminus; K-channel, full; Bayesian BEAST MCC tree of 48 COX1 proteins with median branch length a Range of branch length of COX1 subclades b Panel a. Bayesian BEAST MCC tree with 40 COX1 protein sequences including two proteins of the A1 type subclade a-III that show a branch length longer than average. The alignment had a total of 651 amino acid sites as in Supplementary Figure S10. The orange circle surrounds the node with low support to these proteins. Panel b. DensiTree image of the same 1801 Bayesian tree in panel a. Panel c. ML tree obtained with the PhyML program using the same alignment of 52 COX1 proteins without outgroup paralogs as that used in Figure 6B.