Ancestral Absence of Electron Transport Chains in Patescibacteria and DPANN

Recent discoveries suggest that the candidate superphyla Patescibacteria and DPANN constitute a large fraction of the phylogenetic diversity of Bacteria and Archaea. Their small genomes and limited coding potential have been hypothesized to be ancestral adaptations to obligate symbiotic lifestyles. To test this hypothesis, we performed cell–cell association, genomic, and phylogenetic analyses on 4,829 individual cells of Bacteria and Archaea from 46 globally distributed surface and subsurface field samples. This confirmed the ubiquity and abundance of Patescibacteria and DPANN in subsurface environments, the small size of their genomes and cells, and the divergence of their gene content from other Bacteria and Archaea. Our analyses suggest that most Patescibacteria and DPANN in the studied subsurface environments do not form specific physical associations with other microorganisms. These data also suggest that their unusual genomic features and prevalent auxotrophies may be a result of ancestral, minimal cellular energy transduction mechanisms that lack respiration, thus relying solely on fermentation for energy conservation.


INTRODUCTION
Cultivation-independent research tools have revealed the coding potential (DNA sequence-based prediction of gene functions) of numerous, deep branches of Bacteria and Archaea that were unknown until recently (Wrighton et al., 2012;Rinke et al., 2013;Brown et al., 2015;Castelle et al., 2015). Among them, the candidate bacterial superphylum Patescibacteria (also known as Candidate Phyla Radiation, CPR) and archaeal superphylum DPANN have garnered particular attention, as they appear to constitute a large fraction of the total microbial diversity Hug et al., 2016;Dombrowski et al., 2019). Patescibacteria and DPANN are characterized by small genomes and cell sizes, and predicted minimal biosynthetic and metabolic potential (Wrighton et al., 2012;Luef et al., 2015;. They also appear to have slow metabolism, as indicated by low per-cell ribosome counts (Luef et al., 2015) and slow estimated genome replication rates . Host-dependent endoand ecto-symbioses have been observed in several Patescibacteria (Gong et al., 2014;He et al., 2015;Cross et al., 2019) and the Nanoarchaeota and Nanohaloarchaeota phyla within DPANN (Huber et al., 2002;Podar et al., 2013;Munson-McGee et al., 2015;Jarett et al., 2018;Hamm et al., 2019). Symbiotic associations have also been inferred in several studies of Micrarchaeota (Golyshina et al., 2017;Krause et al., 2017) and Huberarchaeota (Schwank et al., 2019). As a result, it has been posited that the unusual biological features of Patescibacteria and DPANN reflect ancestral adaptations to symbiotic lifestyles Dombrowski et al., 2019). However, direct evidence of symbiosis in Patescibacteria and DPANN is limited to a small number of phylogenetic groups inhabiting surface environments and, in the case of Patescibacteria, dependent on eukaryotic hosts (Gong et al., 2014) or eukaryotic host systems (He et al., 2015;Cross et al., 2019) (i.e., mammalian oral cavities), which suggests relatively recent adaptations.
Here, we performed physical cell-cell association, genomic and phylogenetic analyses on 4,829 individual microbial cells from 46 globally distributed and environmentally diverse locations to gain additional insights into the unusual biological features of Patescibacteria and DPANN. Consistent with prior reports, we found these two superphyla abundant in many subsurface environments, and also confirm their consistently small cell and genome sizes. Our single cell genomic and biophysical observations do not support the prevailing view that Patescibacteria and DPANN are dominated by symbionts Dombrowski et al., 2019). Instead, based on the apparent lack of genes for complete electron transport systems, we hypothesize that last common ancestors of these two superphyla have either not evolved or have lost the capacity for respiration and therefore rely on fermentative metabolisms for energy conservation. Although complex metabolic interdependencies are a rule rather than exception in natural microbiomes (Zengler and Zaramela, 2018), the predicted fermentative energy conservation and limited biosynthetic potential Dombrowski et al., 2019) of Patescibacteria and DPANN may define a highly communal lifestyle of these two superphyla and provide explanation for the extreme difficulty in obtaining them in pure culture.

Field Sample Collection
Field samples were collected from a global set of diverse environments (Figure 1) that were found to contain candidate phyla of Bacteria and Archaea in prior studies (Rinke et al., 2013;Thomas et al., 2013;Moser et al., 2015;Becraft et al., 2017;Hershey et al., 2018;Sackett, 2018;Sackett et al., 2018Sackett et al., , 2019. Immediately after collection, samples were amended with sterile 5% glycerol and 1 mM EDTA (final concentrations) and stored at −80 • C. Field sample metadata is provided for each single amplified genome (SAG) in Supplementary Table S1.
Single Amplified Genome (SAG) Generation, Sequencing, and de novo Assembly Single amplified genome generation and sequencing were performed by Bigelow Laboratory for Ocean Sciences Single Cell Genomics Center (SCGC) and U.S. Department of Energy Joint Genome Institute (JGI) ( Supplementary Table S1). At SCGC, field samples were stained with SYTO-9 nucleic acids stain (Thermo Fisher Scientific), separated using fluorescenceactivated cell sorting (FACS), lysed using a combination of freeze-thaw and alkaline treatment, and their genomic DNA was amplified using WGA-X in a cleanroom, as previously described (Stepanauskas et al., 2017). For sorting of cells with active oxidoreductases, the Beatrix field sample (plate AG-274) was pre-incubated with the RedoxSensor Green stain (Thermo Fisher Scientific) following manufacturer's instructions. Prior to cell sorting, field samples were pre-filtered through a 40 µm mesh-size cell strainer (Becton Dickinson). During cell sorting, cell size estimates were performed using calibrated index FACS (Stepanauskas et al., 2017). All SAGs generated at SCGC were subject to Low Coverage Sequencing (LoCoS) using a modified Nextera library preparation protocol and NextSeq 500 (Illumina) sequencing instrumentation (Stepanauskas et al., 2017). This resulted in a variable number of 2×150 bp reads per SAG, with an average of ∼300k paired-end reads. The reads were de novo assembled using a customized workflow utilizing SPAdes 3.9.0 (Bankevich et al., 2012), as previously described (Stepanauskas et al., 2017). The quality of the sequencing reads was assessed using FastQC 0.11.2 and the quality of the assembled genomes (contamination and completeness) was assessed using checkM (Parks et al., 2015) and tetramer frequency analysis (Woyke et al.,FIGURE 2 | Maximum likelihood phylogenetic trees of concatenated single copy proteins from Bacteria (A) and Archaea (B). All SAGs from this study are highlighted red. Patescibacteria and DPANN are highlighted with gray shading and labeled by individual proposed phyla within the superphyla (Rinke et al., 2013;Brown et al., 2015). Tree source files are available as Supplementary Data S1.
2009). This SAG generation, sequencing and assembly workflow was previously evaluated for assembly errors using three bacterial benchmark cultures with diverse genome complexity and GC content (%), indicating no non-target and undefined bases in the assemblies and average frequencies of mis-assemblies, indels and mismatches per 100 kbp being 0.9, 1.8, and 4.7 (Stepanauskas et al., 2017). Functional annotation was first performed using Prokka 1.12 (Seemann, 2014) with default Swiss-Prot databases supplied by the software. Prokka was run a second time with a custom protein annotation database built from compiling Swiss-Prot (Bateman et al., 2017) entries for Archaea and Bacteria. The uniquely barcoded sequencing libraries of SAGs belonging to candidate divisions were combined, in equal proportions, into 48-library pools and shipped to JGI for deeper (post-LoCoS) sequencing with NextSeq 500 (Illumina) in 2×150 bp mode. Quality filtering of raw reads was performed with BBTools v.37, read normalization with BBNorm, and error correction with Tadpole 1 . The resulting reads were assembled with SPAdes (Nurk et al., 2013) (v3.9.0, -phred-offset 33 -sc -k 22,55,95 -12), and 200 bp was trimmed from the ends of assembled contigs, after which contigs with read coverage < 2 or < 2 kbp in length were discarded. Assemblies were annotated according to IMG standard protocols (Huntemann et al., 2016;Chen et al., 2019). All SAGs are publicly available in IMG/M (Chen et al., 2019), and can be found under their GOLD analysis project identifiers in Supplementary Table S1.

Identification of Heterogenous DNA Sources
The 16S ribosomal RNA gene was identified in SAGs by searching them individually using cmsearch 1.1.2, which is part of the Infernal 1.1.2 package (Nawrocki and Eddy, 2013), using the bacterial 16S rRNA Rfam covariance model 1 . This method is particularly helpful in predicting 16S rRNA genes in Patescibacteria and DPANN, which often have introns in their 16S rRNA genes . Taxonomic assignments to these 16S rRNA genes were conducted using "classify.seqs" within mothur (Schloss et al., 2009) version 1.41.3 against the Silva 132 reference database and taxonomy file (Quast et al., 2013). The resulting taxonomy file was used to search for SAGs that contained two 16S rRNA genes that had different taxonomic phylum-level assignments and were marked as putative co-sorts; those that did not have two 16S rRNA genes were marked as single sorts. The checkM (Parks et al., 2015) contamination estimates ≥ 10% were used to identify SAGs that had high probability of genome admixture (e.g., two different cellular origins). A Chi-squared test was performed in R using the "chisq.test" function on potential co-sorted and single sorted SAGs, and Pearson's residuals were retrieved from the output of this test and used to calculate the percent contribution to each X 2 statistic, and plotted using the "corrplot" package in R Studio.

Genomes From Prior Studies
Taxonomic classification at class, order and family level was not available for many members of candidate phyla in public databases. Therefore, we used data-driven de-replication to generate a representative set of genomes from the ∼70,000 bacterial and archaeal genomes in the Integrated Microbial Genomes and Microbiomes (IMG/M) database (Chen et al., 2019) (genomes accessed April 2018) spanning all bacterial and archaeal phyla. Genomes were selected by clustering the RNA polymerase COG0086 protein sequence at 70% amino acid FIGURE 3 | Relative abundance of Patescibacteria and DPANN in randomized SAG sets from 34 geographically diverse field samples. To avoid potential biases of the deeper sequencing of selected SAGs, only LoCoS results were used in this analysis. SAG microplate identifiers can be cross-referenced with specific SAGs and geographic sites in Supplementary Table S1. The AG-274 SAG microplate was generated from small cells from a water-filled rock fracture at 1,340 m depth below surface in the Beatrix gold mine in South Africa. sequence similarity. From each resulting cluster, the genome with the most complete set of 56 single copy proteins was chosen as a representative. This resulted in a total of 1,025 publicly available SAGs, metagenome bins, and isolate genomes (Supplementary Table S2). Phylum-level classification and symbiotic lifestyle assignments were exported from IMG/M. In cases where IMG/M lacked lifestyle assignments, manual literature searches of organism names were used to determine whether they have documented symbiotic relationships.

Concatenated Single Copy Protein Phylogeny
A set of 56 universal single-copy marker proteins (Eloe-Fadrosh et al., 2016;Yu et al., 2017) was used to build a phylogenetic tree for the newly generated SAGs and MAGs and a representative set of bacteria and archaea based on publicly available microbial genomes in IMG/M (Chen et al., 2019) (genomes accessed in April 2018). Marker proteins were identified with hmmsearch 3.1b2 (Eddy, 2011), using a specific Hidden Markov Model for each of the markers. Genomes for which five or more different marker proteins could be identified were included in the tree. For every marker protein, alignments were built with MAFFT v7.294b (Nakamura et al., 2018) and subsequently trimmed with BMGE v1.12 (Criscuolo and Gribaldo, 2010) using BLOSUM30. Single protein alignments were concatenated and maximum-likelihood phylogenies inferred with FastTree 2.1.10 (Price et al., 2010) using the options: -spr 4 -mlacc 2 -slownni -lg (for archaea) and -spr 4 -mlacc 2 -slownni -wag (for bacteria).

Clusters of Orthologous Groups Principal Coordinates Analysis
Clusters of orthologous groups (COGs) were assigned to SAG (Supplementary Table S1) and reference genome (Supplementary Table S4) predicted protein sequences using reverse position-specific blast (rpsblast 2.7.1+) (Altschul et al., 1997) with an e-value cutoff of 1e-5 and the cdd2cog.pl script 2 . Genomes that were used for the principal coordinates analysis (PCA) had completeness estimates greater than or equal to 30%, and contained 16S rRNA genes for unambiguous phylum-level classification. Eigenvector values were calculated in RStudio 1.1.463 (RStudio Team, 2016) using the cmdscale function from relative abundances of the different COG categories expressed as a percent out of the total number of assigned COGs. PCA plots were visualized with ggplot2 (Wickham, 2016) in RStudio (RStudio Team, 2016). A Wilcoxon test was performed in RStudio using the "wilcox.test" function to determine statistical differences between principal coordinates among the different clusters discussed in the main text. The color scheme for these plots is based on the Color Universal Design 3 , and should be distinguishable by all types of vision. This color scheme was used throughout all the figures in the manuscript.

Coding Sequence Density
Coding sequences (CDS) for SAGs and reference genomes were predicted using Prodigal 2.6.3 (Hyatt et al., 2010). The initial analysis of prokka 1.12 CDS density revealed that numerous SAGs and reference genomes had very low coding densities (less than ∼50%). Prokka utilizes the code 11 translation table by default, and many of these genomes could potentially use stop codons in place of canonical codons (Wrighton et al., 2012;Rinke et al., 2013). We determined the correct translation table to utilize for each genome by comparing the total CDS length from Code 11 and Code 25 predictions, and if the Code 11 total CDS length was greater than the Code 25 total CDS length, then the total length from Code 11 was used in the coding density calculation. If the opposite was true, then the Code 25 total CDS length was used. The coding density was calculated by dividing the total CDS sequence by the total assembly size.

Oxygen Reductase Identification
A published heme-copper oxidase subunit I database (Sousa et al., 2011) from bacteria and archaea was used as a database with blastp 2.7.1+ (Altschul et al., 1990) with an e-value cutoff of 1e-10 using the SAG and reference genomes as queries. The original database file was de-replicated by removing 100% identical sequences, using the dedupe.sh script, which is part of the BBMap package 4 . The sole crystal structure sequence for the bd-ubiquinol oxidase subunit A from Geobacillus thermodentrificans (Safarian et al., 2016) was used as a database for a blastp (Altschul et al., 1990) search using the SAGs and reference genomes as queries with an e-value cutoff of 1e-10.

Oxygen Reductase Horizontal Gene Transfer
The protein sequences identified from the above section were retrieved from SAGs using the grep function from the list of sequence file headers from the above analysis in the SeqKit 0.10.1 package (Shen et al., 2016). Reference protein sequences for Patescibacteria were retrieved via the blastp 2.7.1+ server using the Patescibacteria SAG HCO sequences as queries and selecting for hits only from sequences that were assigned to Patescibacteria and/or CPR. Other reference sequences for Patescibacteria were retrieved by manual literature searches from relevant studies (Nelson and Stegen, 2015;León-Zayas et al., 2017;. The search for Patescibacteria HCOs revealed that they only encoded for the low-affinity Type A HCO, and all  subsequent phylogenetic analyses focused solely on this HCO type. The multi-fasta file containing all HCO sequences was filtered for sequences that were greater than 400 amino acids in length, and aligned with mafft 7.294b (Nakamura et al., 2018) using the "-auto" option and the resulting alignment was trimmed with trimal (Capella-Gutiérrez et al., 2009) to remove gaps using the "-gappyout" option. A maximum-likelihood phylogenetic tree was created using FastTree 2.1.10 (Price et al., 2010) using the LG model of amino acid evolution. No DPANN genome to date has had a positive identification of an HCO subunit I. The methodology for the HCO phylogeny was repeated for the bd-ubiquinol oxygen reductases. Phylogenetic trees were visualized and annotated using the online Interactive Tree of Life tool (Letunic and Bork, 2019).

Oxidoreductase Annotation and Abundance
Enzyme Commission 1 (EC1) class family proteins (i.e., oxidoreductases) were predicted from the SAGs and reference genomes using the prokka "genome.tsv" annotation files. The total number of predicted protein sequences annotated as EC1 for each genome was divided by the total number of predicted protein sequences to provide the percent of protein encoding genes that were predicted to be oxidoreductases. This allows for a direct comparison of all the genomes that exhibited a wide range in completeness estimates.

KEGG Orthology Assignment of Electron Transport Chain Proteins
The Kyoto Encyclopedia of Gene and Genomes (KEGG) orthology (KO) annotations were assigned using KofamKOALA 1.0.0 (Aramaki et al., 2019), which uses hmmsearch 3.1b2 (Eddy, 2011) against curated hidden Markov model (HMM) KO profiles. Only KO profiles related to energy transduction oxidoreductases were used to search the genomes in this study, which were extracted from Supplementary Table 1 in Jelen et al. (2016). Sequences were identified as positive hits if their score was greater than or equal to 50% of the sequence threshold value as calculated in KofamKOALA 1.0.0.  Table S1) and other studies (Supplementary Table S2) with >30% completeness and near-full-length 16S rRNA genes. The vector plot in the upper left corner shows the COG categories that contributed the most to the separation of the genomes: Translation, ribosomal structure and biogenesis (J), RNA processing and modification (A), transcription (K), replication, recombination and repair (L), chromatin structure and dynamics (B), cell cycle control, cell division, chromosome partitioning (D), nuclear structure (Y), defense mechanisms (V), signal transduction mechanisms (T), cell wall/membrane/envelope biogenesis (M), cell motility (N), cytoskeleton (Z), extracellular structures (W), intracellular trafficking, secretion, and vesicular transport (U), posttranslational modification, protein turnover, chaperones (O), energy production and conversion (C), carbohydrate transport and metabolism (G), amino acid transport and metabolism (E), nucleotide transport and metabolism (F), coenzyme transport and metabolism (H), lipid transport and metabolism (I), inorganic ion transport and metabolism (P), secondary metabolites biosynthesis, transport and catabolism (Q), general function prediction only (R), function unknown (S). SAG, single amplified genome; MAG, metagenome assembled genome. Symbiont genomes are listed in Supplementary Table S4. Note position of Nanoarchaeum equitans with black arrow.

16S Ribosomal RNA Gene Phylogeny
16S rRNA gene sequences predicted using cmsearch 1.1.2 (Nawrocki and Eddy, 2013) were filtered for sequences that were greater than or equal to 1200 bp using bioawk 20110810 5 . Sequences that were 100% identical were removed using dedupe.sh (see text footnote 5). Sequences were then aligned using ssu-align 0.1.1 (Nawrocki, 2009), which produces two separate alignment files for Bacteria and Archaea. Next, ambiguously aligned positions were removed using ssu-mask, and sequences were re-checked to ensure that the masked alignment contained sequences that were greater than or equal to 1200 bp. Sequences that did not meet these threshold requirements were removed from the alignment file using ssumask with the "-seq-r" option and list of sequences to remove. The Stockholm alignment file was converted to an aligned fasta file using ssu-mask with the "-stk2afa" option. The masked and filtered alignment files for Bacteria and Archaea were used to create phylogenetic trees using maximum-likelihood 5 https://github.com/lh3/bioawk reconstruction with FastTree 2.1.10 (Price et al., 2010) with the following parameters: "-nt -gtr -cat 20 -gamma." Both trees were visualized and annotated using the Interactive Tree of Life (Letunic and Bork, 2019).

Global Presence of Patescibacteria and DPANN in Subsurface Environments
To improve our understanding of the deep genealogy of Bacteria and Archaea, we sequenced 4,829 microbial SAGs (Supplementary Table S1) from 46 globally distributed field sites (Figure 1 and Supplementary Table S1). These sites were chosen based on 16S rRNA gene amplicon screens that were enriched in bacterial and archaeal candidate phyla. We identified 16% and 2% of SAGs as Patescibacteria (n = 770) and DPANN archaea (n = 113) (Figure 2). The concatenated SCP phylogenetic tree revealed the separation of Patescibacteria and DPANN from other Bacteria and Archaea, respectively,    Table S1) and previously reported genome sequences (Supplementary Table S2). The four innermost rings depict the counts of the electron transport chain complexes: I (NADH dehydrogenase subunits), II (succinate dehydrogenase subunits), III (cytochrome c reductase subunits), and IV (oxygen, nitrate, sulfate, iron, arsenate, and selenate reductase subunits). The outermost ring shows the relative abundance of oxidoreductases (Ox) for each genome assembly as a gradient from low (blue) to high (yellow). The peripheral stacked bar charts show the counts of oxygen reductases from both the heme copper oxidase and bd-ubiquinol oxidase oxygen reductase (O 2 red) families grouped as high (orange) or low (sky blue) affinity for oxygen (note scale bar differences between bacterial and archaeal trees). of 7.5% (range = 0-23%) in 33 analyzed environmental sites, with elevated abundances in deep-sourced aquifer environments (Figure 3). Most of the Patescibacteria and DPANN SAGs originated from 13 continental subsurface sites in Africa, Asia, and North America (Supplementary Table S1). These results confirm that Patescibacteria and DPANN are globally abundant members of subsurface microbial communities, expanding on the prior genomic studies that were predominantly based on a small number of study locations in North America (Rinke et al., 2013;Luef et al., 2015;.

Evidence for Physical Cell-Cell Associations
We searched for evidence of physical cell-cell associations-an implication of obligate symbiosis-by identifying genomic sequences from multiple phylogenetically distinct organisms within individual SAGs that have been subjected for deeper, post-LoCoS sequencing. First, we searched for multiple copies of conserved, single-copy protein-encoding genes using checkM (Parks et al., 2015), which is a commonly used tool to detect genome contamination. This approach identified 1% of Patescibacteria SAGs (5/492), 1.2% of DPANN SAGs (1/81), and 0.3% of SAGs from other phyla (5/1686) as containing DNA from heterogeneous sources ( Supplementary Table S3). Next, we searched for near-full-length (>1,000 bp) 16S rRNA genes in individual SAG assemblies with taxonomic placement in different phyla. Among the deep-sequenced SAGs, such cases accounted for 1.5% of Patescibacteria (4/262), 0% DPANN (0/56), and 0.53% for other phyla (4/758) (Supplementary Table S3). A Chi-square test revealed that there was a significant relationship between phyla and potentially co-sorted SAGs from both checkM (p-value = 1.2 × 10 -13 ; X 2 = 224.2) and 16S rRNA gene analyses (p-value < 2.2 × 10 -16 ; X 2 = 238.07), but the overall contribution of Patescibacteria and DPANN to the significance of co-sorted SAGs was very low (<0.5%) relative to other phyla (Figure 4). Due to the incomplete SAG assemblies (Supplementary Table S1), these sequencing-based approaches may underestimate the overall frequency of cellcell associations in our data set. However, they consistently show that putative cell-cell associations constitute only a minor fraction of all SAGs, and that Patescibacteria and DPANN are not significantly enriched in such associations relative to other phyla in the studied environments. Furthermore, all identified cases of heterogeneous DNA in SAG assemblies were phylogenetically unique (Supplementary Table S3), in contrast to the recurring Nanoarchaeota-Crenarchaeota symbiotic associations found using the same techniques in hot springs in prior studies (Munson-McGee et al., 2015;Jarett et al., 2018). Also, in mammalian oral microbiomes, Saccharibacteria have been shown to be specifically associated with Actinobacteria hosts (He et al., 2015;Cross et al., 2019). This suggests that the infrequent and inconsistent presence of taxonomically heterogeneous DNA in SAGs most likely originated from nonspecific aggregation of multiple cells and/or attachment of extracellular DNA.
Based on a small number of transmission electron micrograph observations, it has been suggested that Patescibacteria associations with other microorganisms may be fragile (Luef et al., 2015). Thus, we cannot rule out the possibility that some Patescibacteria and DPANN cells were attached to other cells in situ and became detached during sample collection and processing. To reduce the risk of dispersing natural cell aggregates and associations, we performed only a gentle mixing of the analyzed samples in preparation for cell sorting. In prior studies, similar techniques successfully revealed host-symbiont associations in termite guts (Hongoh et al., 2008), marine plankton (Martinez-Garcia et al., 2012) and hot springs (Jarett et al., 2018). This approach was also used to determine symbiotic associations between anaerobic methane-oxidizing archaea and their syntrophic partners in natural consortia from methane seeps (Hatzenpichler et al., 2016). It is worthy to note that the Saccharibacteria-Actinobacteria symbiont-host relationship was only disrupted by physical passage through a narrowgauge needle multiple times (He et al., 2015). Also, putatively co-sorted SAGs of Nanoarchaeota and Crenarchaeota from iron oxide microbial mats were treated by a repeated physical disruption through multiple wash cycles and density-gradient centrifugation, from which co-sorted cells were obtained (Jarett et al., 2018). Symbioses may vary in the strength of their physical inter-cellular associations, e.g., due to differences in their metabolic interdependencies. However, although the techniques applied here may underestimate the overall counts of cell-cell associations in situ, we found no evidence for Patescibacteria and DPANN to be enriched in such associations relative to other phyla, and to form lineage-specific associations in the analyzed environments.

Cell Diameters
We employed calibrated index FACS to determine physical diameters of individual cells that were used in SAG generation (Stepanauskas et al., 2017). This indicated that Patescibacteria and DPANN cells are extremely small across their entire phylogenetic breadth, with median estimated diameters of 0.2 µm (Figure 5). Several outliers with estimated larger diameters may be due to attachment to other cells and particles, cellular division, methodological artifacts, or true biological variation. Most of the SAGs with identified heterogeneous genome sources were larger than their phylum median cell diameters (Supplementary Table S1), which is consistent with their aggregation with other cells. The low frequency of Patescibacteria and DPANN DNA recovery from larger particles (Supplementary Table S1 and Figure 5) provides further indication that most of these cells were not attached to other microorganisms.
To further investigate the composition of extremely small cells, we generated a complementary library of SAGs from a single subsurface sample (microplate AG-274; Supplementary Table S1) with a FACS gate targeting only ≤0.3 µm particles. Confirming our expectations, >90% of SAGs in this cell diameter-specific library were composed of Patescibacteria and DPANN (Figure 3). The obtained cellular size ranges are consistent with a prior report, which was based on transmissionelectron micrographs from one field study site (Luef et al., 2015). These cell diameters approximate the lower theoretical limits for cellular life (Maniloff et al., 1997).

General Genome Features
To identify functional coding potential differences of Patescibacteria and DPANN compared to other Bacteria and Archaea, we performed a principal coordinates analysis (PCA) using the relative abundance of COGs as input variables with SAGs that had at least 30% completeness and a near full-length 16S rRNA gene (Figure 6). This showed a clear separation of Patescibacteria and DPANN from other bacteria and archaea along the first coordinate (PC1) (Wilcoxon signed-rank test; p-value < 2.2 × 10 −16 ). Importantly, welldescribed symbionts (Supplementary Table S4) separated from Patescibacteria along PC1 and DPANN along PC2 (p-value = 2.57 × 10 −8 and 1.0 × 10 −7 for Patescibacteria and DPANN, respectively). Please note that this does not include Nanoarchaeum equitans, which clusters with the other DPANN but is also a known symbiont. The only lineages that clustered with Patescibacteria and DPANN along PC1 and PC2 were Dependentiae and Tenericutes, respectively.
The COG categories with the greatest negative effect on PC1, indicative of their relative depletion in Patescibacteria and DPANN, included E (amino acid metabolism and transport), C (energy production and conversion), P (inorganic ion transport and metabolism), and H (coenzyme transport and metabolism). The COG categories with the greatest positive effect on PC1, indicative of their relatively high fraction in genomes of Patescibacteria and DPANN, included D (cell cycle control and mitosis) and O (post-translational modification, protein turnover, chaperone functions). Archaea separated from bacteria along the second coordinate (PC2) (p-value < 2.2 × 10 −16 ) primarily by their relative enrichment in COG categories B (chromatin structure and dynamics), K (transcription), and S (unknown functions). This reflects the major inter-domain differences in DNA packing and transcription, and the greater fraction of archaeal genomes remaining uncharacterized, as compared to the genomes of Bacteria.
Genomes recently shaped by symbiosis often have low coding densities due to rapid gene loss and pseudogene formation (McCutcheon and Moran, 2012). Inconsistent with this pattern, we found the coding density of Patescibacteria and DPANN (median = 91%) to be typical of Bacteria and Archaea (median = 90%), while well-characterized symbionts were separated by their lower coding density ( Figure 7A) (median = 0.87%, p-value = 0.035 and 0.028 compared to Patescibacteria and DPANN). Although the small genome size of Patescibacteria and DPANN has been viewed as an indication of a symbiotic lifestyle , similar genome sizes (1-2 Mbp) are typical among freeliving, marine plankton (Swan et al., 2013;Giovannoni et al., 2014). Furthermore, recent synthetic biology experimentation has pushed the minimal genome size limit of a free-living microorganism to ∼0.5 Mbp (Hutchison et al., 2016), far below the predicted sizes of Patescibacteria and DPANN genomes. Collectively, these general genome features of Patescibacteria and DPANN do not provide convincing evidence of an obligate symbiotic lifestyle.
In this context, the observed gene content similarities between Patescibacteria and Dependentiae, and between DPANN and most Tenericutes are intriguing (Figure 6). Dependentiae is a candidate bacterial phylum that has been noted for its reduced coding potential, including a depletion in electron transport chain components (McLean et al., 2013;Yeoh et al., 2016). It has been speculated that these characteristics indicate a symbiotic lifestyle, with energy acquired from hosts via ATP/ADP translocases, which has been confirmed experimentally in a few Dependentiae members (Delafont et al., 2015;Pagnier et al., 2015;Deeg et al., 2019). The well-characterized members of the bacterial phylum Tenericutes consist mostly of obligate pathogens with reduced genomes (Moran and Wernegreen, 2000). Interestingly, most Tenericutes are able to grow as free-living cells in rich media solely by fermentation (Tully et al., 1977), and were originally hypothesized to represent ancient lineages of life due to their small genome sizes and limited metabolisms (Morowitz, 1984). While we found all analyzed Dependentiae and most Tenericutes deplete in oxidoreductases (Figures 7B, 8), only Tenericutes had a consistently low coding density (median = 71%) that is a characteristic of recently evolved symbionts (McCutcheon and Moran, 2012) ( Figure 7A). Thus, we hypothesize that these two phyla cluster with the Patescibacteria and DPANN due to similar metabolic features arrived at by convergent evolutionary processes.

Oxygen Reductase Genes
In search for an alternative explanation for the unique genealogy, genome content, and cell sizes of Patescibacteria and DPANN, we examined energy metabolic coding potential in deepsequenced (post-LoCoS) SAGs. We found that only 0.6% of Patescibacteria SAGs (3/492) and none of the DPANN SAGs (0/81) encoded homologs of oxygen reductases (O 2 red), as indicated by the presence of oxygen-binding subunit I of either the heme-copper oxidase (HCO) or bd-ubiquinol (bd) oxidase families. Although individual assemblies of 492 Patescibacteria and 81 DPANN SAGs were incomplete, cumulatively they represent an estimate of 162 and 27 randomly sampled, complete genomes. Therefore, the incompleteness of individual SAGs cannot explain the consistent absence of the targeted genes. Furthermore, a phylogenetic analysis revealed that all three O 2 red from Patescibacteria SAGs form a cluster with other Patescibacteria sequences Nelson and Stegen, 2015;León-Zayas et al., 2017) that is nested within a clade comprised of other phyla (Figure 9). We infer these phylogenetic relationships as an indication of a relatively recent horizontal gene transfer (HGT), likely from Proteobacteria and Firmicutes for the HCO and bd sequences, respectively. Although we did not detect any homologs of oxygen reductases in DPANN SAGs from our samples, the publicly available bd O 2 red sequences from DPANN metagenome bins and isolates formed a clade with Actinobacteria and Firmicutes, which we also infer as likely products of relatively recent HGT events. The topology of these O 2 red phylogenetic trees is consistent with prior reports, which have also been interpreted as evidence for prevalent HGT of oxygen reductase genes among other phyla Gribaldo et al., 2009;Borisov et al., 2011). This suggests that the absence of oxygen reductases in Patescibacteria and DPANN is ancestral for the two superphyla.

Distribution of Electron Transport Chain Complexes
Patescibacteria and DPANN were depleted in the entire family of oxidoreductase enzyme genes compared to other bacteria and archaea, (p-value < 2.2 × 10 −16 ) (Figures 7B, 8). This depletion was also significant in relation to symbionts with their comparatively small genome sizes (p-value < 0.05). Oxidoreductases are key components of both aerobic and anaerobic respiratory pathways (Jelen et al., 2016), so underrepresentation of them would suggest reduced functionality of these energy transduction mechanisms. Accordingly, none of the Patescibacteria and DPANN genomes were found to encode a complete ETC consisting of all four complexes (Figure 8). Putative homologs of at least two of the four ETC complexes were found only in 3% and 11% of Patescibacteria and DPANN genomes, respectively. We found putative homologs of genes encoding individual complexes I, II, III, and IV in 0%, 2%, 3%, and 14% of Patescibacteria genomes. The corresponding numbers for DPANN were 7%, 4%, 0%, and 21%. Some of these computationally predicted genes are only distantly related to experimentally verified homologs and therefore may constitute false positives. These findings are consistent with the lack of complete ETC reports in prior studies of Patescibacteria genomes , with the sole exception of a tentative nitric oxide respiration operon found in a single metagenome bin (Castelle et al., 2017). The sparse and scattered distribution of the putative ETC gene homologs in Patescibacteria and DPANN (Figure 8) suggest HGT origins rather than ancestral inheritance. This is consistent with the phylogenetic reconstructions of other energy transducing genes identified in Patescibacteria, which also suggest evolutionary origins from HGT (Jaffe et al., 2020). Collectively, our observations indicate that the absence of complete electron transport chains in Patescibacteria and DPANN is an ancestral feature of the two superphyla, which we propose is more parsimonious than multiple gene loss events due to obligate symbiosis Hug et al., 2016;Dombrowski et al., 2019;Méheust et al., 2019).

Respiration Activity
To experimentally test for the presence of active oxidoreductases in a subsurface microbial community, we employed the fluorogenic oxidoreductase probe RedoxSensor Green on a deep groundwater sample from South Dakota. This revealed a wide range in fluorescence intensity in phylogenetically diverse cells, with none of the Patescibacteria cells exceeding the fluorescence of particles in a heat-killed, negative control (Figure 10). To the best of our knowledge, RedoxSensor Green has not been tested extensively on diverse microbial lineages, therefore these results should be considered tentative. Nonetheless, both genome content and in situ physiology analyses indicate the absence of respiration in Patescibacteria and DPANN, which is consistent with earlier reports of these lineages containing few, if any, components of energy transducing pathways other than fermentation .

16S rRNA Gene Phylogeny
The placement of Patescibacteria and DPANN in the tree of life is widely debated (Hug et al., 2016;Williams et al., 2017;Dombrowski et al., 2019). Most current phylogenetic inferences are based on concatenated single-copy proteins (CSCP), which has the advantage of higher phylogenetic resolution, as compared to phylogenies of individual genes (Rinke et al., 2013). However, the unknown genetic change at heterogeneously evolving sites and large sequence divergence may limit the accuracy of such trees (Pace, 2009;Dombrowski et al., 2019). To complement the CSCP-resolved genealogy (Figure 2), we performed a largescale phylogenetic analysis of the well-established 16S rRNA gene (Woese, 2002) (length > 1,200 bp) separately for Bacteria and Archaea. The obtained phylogenetic inference (Figure 8) supported the separation of Patescibacteria and DPANN from other bacterial and archaeal lineages, in agreement with the phylogenies based on CSCP genes   (Figure 2) and a recent, large-scale bacterial 16S rRNA gene tree . Importantly, we did not observe grouping of Patescibacteria with fast-evolving lineages (e.g., obligate insect symbionts and Tenericutes) that could be due to long branch attraction in the 16S rRNA gene phylogeny. This suggests that the divergent branching of Patescibacteria and DPANN is probably not a result of recent, accelerated divergence.

CONCLUSION
Using the collective evidence from cell-cell association, coding potential and phylogenetic analyses, we propose a new explanation of the unusual biological features of Patescibacteria and DPANN. Although the Patescibacteria and DPANN contain symbionts (Huber et al., 2002;Podar et al., 2013;Gong et al., 2014;He et al., 2015;Munson-McGee et al., 2015;Golyshina et al., 2017;Krause et al., 2017;Jarett et al., 2018;Cross et al., 2019;Hamm et al., 2019;Schwank et al., 2019), we believe that there is not sufficient evidence to conclude that adaptations to symbiosis have led to the reduction of their cell sizes and coding potential Dombrowski et al., 2019;Méheust et al., 2019). Instead, our data indicate that most Patescibacteria and DPANN do not form symbiotic cell-cell associations in subsurface environments, and that their divergent coding potential, small genomes, and small cell sizes may be a result of an ancestral, primitive energy metabolism that relies solely on substrate-level phosphorylation (fermentation). This primitive mode of energy conservation may either precede the evolution of electron transport phosphorylation (respiration) or be a result of an ancestral loss of respiration capabilities. Auxotrophies are very common among microorganisms and represent a wide range of dependencies for exogenous cellular components (Zengler and Zaramela, 2018). Patescibacteria and DPANN may be on the extreme end of the spectrum in their dependence on other community members, perhaps a reflection of an ancient evolutionary strategy to limit cellular biosynthetic energy requirements, as energetic allocation is a major driver of genome evolution in bacteria and archaea (Lynch and Marinov, 2015).

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
JPB led data analyses and manuscript preparation. RS developed the concept and managed the project, with contributions by TW, TO, DM, JE, JPB, and EB. EB, JMB, FS, JJ, OB, and KC contributed to data analyses. NP performed cell sorting and size calibration at Bigelow Laboratory. TO, DM, PD, NR, JS, BH, KK, SS, ME, HB, and MS oversaw field sample collection. All authors contributed to data interpretation and manuscript preparation.

FUNDING
This work was funded by the USA National Science Foundation grants 1441717, 1826734, and 1335810 (to RS); and 1460861 (REU site at Bigelow Laboratory for Ocean Sciences). RS was also supported by the Simons Foundation grant 510023. TW, FS, and JJ were funded by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility supported under Contract No. DE-AC02-05CH11231. NR group was funded by the Russian Science Foundation (grant 19-14-00245). SS was funded by USA National Science Foundation grants OCE-0452333 and OCE-1136727. BH was funded by NASA Exobiology grant 80NSSC17K0548.
Project Office); and Jaret Heise, Tom Regan and others at Sanford Underground Research Facility (SURF); Corey Lee (US Fish and Wildlife Service) and The Nature Conservancy for site access and sampling assistance in Nevada and South Dakota. Special thanks to Brittany Kruger, Josh Sackett, Scott Hamilton-Brehm, John Healey, Brad Lyles, and Chuck Russell of DRI for sampling assistance in Nevada. Thanks to the U.S. Forest Service for a permit to conduct research at Little Hot Creek, California and the 2015 International Geobiology Course for field work and sample collection. We thank Vitaly Kadnikov, Olga Karnachuk and Tamara Zemskaya for sample collection in Siberia. Also, thanks to Stephen Grasby, Allyson Brady, and Christine Sharp for sampling Canadian field samples, and Wen-Jun Li for logistical support and access to hot spring samples in China. We thank British Columbia Parks and Parks Canada for permission to sample Dewar Creek and Paint Pots field sites.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmicb. 2020.01848/full#supplementary-material TABLE S1 | Genome sequencing and environmental data associated with SAGs from this study. Deep-sequenced SAGs have an ID in the column "gold.analysis.id," whereas SAGs sequenced using LoCoS only do not have this ID.     Figure 5. DATA S1 | Source files for maximum likelihood phylogenetic trees of concatenated single copy proteins from Bacteria and Archaea ( Figure 2). DATA S2 | Source files for maximum likelihood phylogenetic trees of 16S rRNA genes from Bacteria and Archaea (Figure 8).