Influence of soil nutrients on the presence and distribution of CPR bacteria in a long-term crop rotation experiment

Bacteria affiliated with the Candidate Phyla Radiation (CPR) are a hyper-diverse group of ultra-small bacteria with versatile yet sparse metabolisms. However, most insights into this group come from a surprisingly small number of environments, and recovery of CPR bacteria from soils has been hindered due to their extremely low abundance within complex microbial assemblages. In this study we enriched soil samples from 14 different soil fertility treatments for ultra-small (<0.45 μm) bacteria in order to study rare soil CPR. 42 samples were sequenced, enabling the reconstruction of 27 quality CPR metagenome-assembled genomes (MAGs) further classified as Parcubacteria/Paceibacteria, Saccharibacteria/Saccharimonadia and ABY1, in addition to representative genomes from Gemmatimonadetes, Dependentiae and Chlamydae phyla. These genomes were fully annotated and used to reconstruct the CPR community across all 14 plots. Additionally, for five of these plots, the entire microbiota was reconstructed using 16S amplification, showing that specific soil CPR may form symbiotic relationships with a varied and circumstantial range of hosts. Cullars CPR had a prevalence of enzymes predicted to degrade plant-derived carbohydrates, which suggests they have a role in plant biomass degradation. Parcubacteria appear to be more apt at microfauna necromass degradation. Cullars Saccharibacteria and a Parcubacteria group were shown to carry a possible aerotolerance mechanism coupled with potential for aerobic respiration, which appear to be a unique adaptation to the oxic soil environment. Reconstruction of CPR communities across treatment plots showed that they were not impacted by changes in nutrient levels or microbiota composition, being only impacted by extreme conditions, causing some CPR to dominate the community. These findings corroborate the understanding that soil-dwelling CPR bacteria have a very broad symbiont range and have metabolic capabilities associated to soil environments which allows them to scavenge resources and form resilient communities. The contributions of these microbial dark matter species to soil ecology and plant interactions will be of significant interest in future studies.


Introduction
Since the discovery of Microgenomatia (OD11) from the Obsidian pools in Yellowstone National Park (Hugenholtz et al., 1998), the CPR have been reported in diverse environments leading to the discovery of Parcubacteria (OD1) and Asconditabacteria (SR1) taxa (Harris et al., 2004). The systematic study of these uncultured bacteria led to the description of the hyper-diverse monophyletic group described as the CPR (Brown et al., 2015), comprising more than 20% of known bacterial diversity Parks et al., 2017), significantly increasing the previously described Patescibacteria clade (Rinke et al., 2013).
Members of this group have small cells (0.009 ± 0.002 μm 3 ) (Luef et al., 2015) and patchy metabolisms with incomplete or lacking major pathways such as nucleotide, amino acid and lipid metabolism, TCA cycle or electron transport chains (Castelle et al., 2018), and are predicted to rely on alternate mechanisms to generate and exploit the proton motor force (Wrighton et al., 2012;Kantor et al., 2013). Despite these observed genome reductions, CPR bacteria have innovative features such as alternate fermentation enzymes (Kantor et al., 2013), hybrid RuBisCOs that function in an archaeal-like fashion (Wrighton et al., 2016), and deployment of other archaeal-like enzymes (Ghuneim et al., 2018). Additionally, they have been predicted to participate in nitrogen, sulfur and carbon cycling (Wrighton et al., 2014;Danczak et al., 2017). The foundation of our current understanding of CPR taxa focused on aquifers, groundwater, and associated sediment (Wrighton et al., 2012;Castelle et al., 2013;Brown et al., 2015;Luef et al., 2015;Anantharaman et al., 2016;Danczak et al., 2017). Therefore, the study of these ultra-small bacteria in other environmental contexts is imperative to better understand the extent of their diversity, metabolic capabilities, and ecological roles.
Soil environments harbor enormous bacterial diversity (Torsvik and Ovreas, 2002); however, members of the CPR group have only sparsely been reported in soil microbiome surveys (Yeates et al., 2000;Wagner et al., 2009;Lemos et al., 2020;Santana-Pereira et al., 2020). Recovery of soil dwelling CPR is challenging due to their extremely low abundances (Herrmann et al., 2019) and intronic 16S rRNA genes that evade amplicon-based surveys (Brown et al., 2015). Despite the advent of genome-resolved metagenomics (Sharon and Banfield, 2013), the challenges in assembling sequences from complex soil metagenomes has hindered the reconstruction of these low abundant CPR genomes from soil and to date only a few studies have achieved this goal (Kroeger et al., 2018;Woodcroft et al., 2018;Lemos et al., 2020;Sharrar et al., 2020;Nicolas et al., 2021).
Another significant challenge in accessing CPR genomes is their low relative abundance within soil microbiota. Other studies have used size fractionation to isolate distinct populations of microbial cells (Portillo et al., 2013), and to enrich for novel phylogenetic groups including CPR bacteria that may not be observed from studies of bulk soil (Nakai, 2020;Lee et al., 2021;Nicolas et al., 2021). We therefore used an ultra-small (<0.45 μm) bacterial cell enrichment protocol to enrich for low abundance CPR bacteria, and other members of the rare biosphere, from soil samples of the Cullars Rotation (Auburn, AL). The Cullars Rotation is the oldest ongoing soil fertility experiment in the southern United States, with a 3-year rotation of corn, cotton, soybean and winter legumes across 14 plots that vary significantly in NPK fertilizer and lime amendments (Zhao et al., 2015). Direct sequencing of the small cellderived metagenomic DNA allowed the reconstruction of 31 MAGs predominantly from CPR taxa. CPR genomes were annotated for their predicted potential for biomass degradation, and their abundance across Cullars Rotation soil treatment groups was correlated with differences in nutrient, pH, and metal levels, shedding light into their possible ecological roles and distribution patterns in soil.

Sampling site
The Cullars Rotation is a century old crop rotation experiment alternating between five crops during the course of a year (cotton, crimson clover, corn, wheat and soybean) (Mitchell et al., 2011). The rotation is composed of 14 plots for the study of the effects of controlled deficiencies of phosphorus (P), potassium (K), nitrogen (N) and sulfur (S), as well as other soil amendments such as lime (for pH control) and addition of a micronutrient mix containing boron (B), zinc (Zn), manganese (Mn), copper (Cu) and iron (Fe) on crop yield (Mitchell et al., 2005) (Figure 1).

Sample collection and soil analysis
Soil samples were obtained from the Cullars Rotation in March 2019. Three topsoil samples from each one of the 14 plots in Cullars Rotation were collected for soil DNA extraction, alongside ~400 g of bulk soil for analysis. The samples for DNA extraction were immediately transported to the lab and kept refrigerated at 4°C until processing. Processing of the 42 soil samples (3 replicates/14 plots) were completed on the same day as sampling. The bulk soil samples were submitted for analysis at Auburn University's Soil Testing Laboratory the same day. Soil nutrients were obtained using Mehlich I Extraction and analyzed by ICP. Measurements were obtained for calcium (Ca), K, magnesium (Mg), P, aluminum (Al), B, Cu, Fe, Mn, sodium (Na), Zn, Nitrate-Nitrogen (NO 3 -N), all in parts per million (ppm). Percent carbon ©, N, and pH were also measured.

Ultra-small cell enrichment
For each sample, 5 g of soil was suspended in 25 mL of 0.2% tetrasodium pyrophosphate buffer (pH 8.0), which has been shown to dissociate microbial cells from soil particulates (e.g., clay, sand and silt) while retaining the microbial community structure associated with bulk soil (Portillo et al., 2013;Yuan et al., 2018). Suspensions were homogenized for 10 min, with gentle inversions, followed by 10 cycles of intermittent vortexing for 5 s. The soil slurry was subjected to a low-speed centrifugation (5,000 x g for 3 min) to precipitate larger soil particulates and eukaryotic cells. The supernatant was collected and serially filtered through a 5.0 μm cellulose membrane, then 1.2 μm and 0.45 μm SUPOR ® membranes (Pall Corporation, Port Washington, New York) in tandem. The <0.45 μm fraction was then pelleted by centrifugation (20,000 x g for 20 min) and DNA was isolated from the cell pellet using the Power Soil DNA isolation kit (MoBIO Labs) to obtain the <0.45 μm fraction metagenomic DNA ( Figure 1). There were three topsoil replicates from each of the 14 plots in the Cullars Rotation, resulting in 42 different samples from which a < 0.45 μm fraction metagenome was generated.
Frontiers in Microbiology 03 frontiersin.org 2.4. Shotgun sequencing, binning, and genome annotation Each <0.45 μm fraction metagenome was uniquely indexed and paired-end sequenced using an Illumina HiSeq platform in one dedicated run (Figure 1). Raw reads from all metagenomes were processed using Trimmomatic v0.39 (Bolger et al., 2014). The remaining reads from all 42 samples were combined and assembled with MEGAHIT v1.2.9 (Li et al., 2015). The reads from each sample were mapped back against the assembled contigs using BWA v0.7.17 (Li and Durbin, 2009). Parallel binning was conducted with MetaBAT2 v2.15 (Kang et al., 2019), MaxBin 2.0 v2.2.7 (Wu et al., 2016) and CONCOCT v1.1.0 (Alneberg et al., 2014). Resulting genomic bins were curated using DasTool v1.1.4 (Sieber et al., 2018). The completeness and contamination of each bin was assessed using CheckM v1.1.3 (Parks et al., 2015) using the standard bacterial marker set and a CPR-specific phylogenetic marker set (Brown et al., 2015). Bins with completeness below 65%, contamination above 10% or strain heterogeneity above 10% were dropped from subsequent analyses. Bins with completeness above 85%, contamination below 5% and strain heterogeneity below 5% were considered high-quality MAGs, while bins with completeness above 75%, contamination below 5% and strain heterogeneity below 5% were considered good-quality MAGs and the remainder were considered draft MAGs. To aid in genome assembly and binning, an additional set of reads from a similar sequencing run of a pilot enrichment for Cullars Rotation's Plot C was added to the read pool for a second round of assembly and binning following the same steps and criteria above. The total bins obtained were dereplicated using Derep v0.12.8 (Olm et al., 2017) and used in the subsequent analysis.

MAG phylogenetic classification
MAGs were taxonomically classified using two strategies: 16S rRNA gene sequence homology and phylogenetic marker gene voting. First, 16S rRNA sequences were extracted using CheckM v1.1.3 (Parks et al., 2015) and a custom package for CPR 16S rRNA extraction and intron removal (Brown et al., 2015). The MAG-associated 16S rRNA genes were classified against the SILVA database (Quast et al., 2012) (accessed February, 2020) using SINA  through the web client. A homology threshold of 75% was used for phylum classification and 80% for class classification. For the second approach, 13 phylogenetically informative marker genes (Supplementary Table S1) corresponding to ribosomal proteins with similar phylogenetic signal to 16S rRNA (Gregor et al.,  Table with the Standard Fertilization treatments applied to each plot (Mitchell et al., 2005). (C). Average soil nutrient parameters from sampled bulk soil. (D). Overview of the methods employed to reconstruct the MAGs from the filtration enrichment (mini-metagenome), reconstruct the MAG community and reconstruct the total bacterial community using 16S amplification from soil total DNA extraction.
Frontiers in Microbiology 04 frontiersin.org 2016), were extracted from each bin using CheckM v1.1.3. The markers were annotated against the nr/nt NCBI database. Then each marker "casts a vote" according to their taxonomic annotation. The taxonomy with the higher number of "votes" was assigned to the MAG. Finally, the consensus classification from the two strategies was applied to the MAG. Both strategies used NCBI taxonomy, which is itself based on the taxonomy and nomenclature proposed by Brown et al. (2015). MAGs classified as CPR were selected for a robust additional classification by constructing a concatenated maximum likelihood tree. A curated collection of reference CPR genomes was constructed by downloading 1,200 CPR genomes from NCBI (accessed October 2019) and filtering out low quality MAGs with the program CheckM v1.1.3 by the following requirements: >75% completeness, < 5% contamination and strain heterogeneity. The set of 13 phylogenetic markers was extracted from each reference genome. For each marker, the Cullars MAG sequences and the references sequences were aligned using MAFFT v7.475 (Katoh and Standley, 2013), trimmed with TrimAl v1.4.1 (Capella-Gutierrez et al., 2009). Then, the marker alignments were concatenated using Phylemon 2.0 (Sanchez et al., 2011) and manually inspected with Geneious 10.3. 1 The concatenated alignment of 13 markers was used to generate a maximum likelihood tree with 1,000 bootstraps using RAxML v8.2.9 (Stamatakis, 2014). Marker genes from the Dependentiae MAGs were used to root the tree. The final tree was visualized and annotated using iTOL (Letunic and Bork, 2019), using Brown et al. (2015) proposed nomenclature.
In an effort to also comply with the efforts toward an standardized bacterial classification and taxonomy (Parks et al., 2020), our MAGs were classified according to GTDB  using GTDB-Tk v2.2 , and utilized the (Brown et al., 2015) proposed nomenclature for phylogenetic affiliations.

MAG metabolic potential analysis
To assess metabolic and nutrient utilization potential for each of our CPR MAGs, each of the sufficient quality MAGs and members of the same taxa from the curated collection of reference genomes, were annotated using DRAM v1.3.5 (Shaffer et al., 2020). This software utilizes a series of databases (UniRef90, PFAM, dbCAN, Refseq viral, VOGDB, and MEROPS peptidase) to annotate genes and reconstruct the organism's metabolic profile. DRAM annotations were analyzed with special attention to carbohydrate utilization via carbohydrate-active enzymes (CAZymes) . Profiles between MAGs and their corresponding references were compared using a 2-way analysis of variance (ANOVA) based on a linear regression model using the lme4 (Bates et al., 2015) and emmeans (Lenth et al., 2019) packages. Output was analyzed in RStudio v. 4.0.3.

Cullars <0.45 μm fraction bacterial community reconstruction and distribution across Cullars plots
The <0.45 μm fraction bacterial community from Cullars soil was reconstructed from the assembled MAGs. To estimate MAG counts, reads from each sample were individually mapped back against each MAG using Bowtie2 (Langmead and Salzberg, 2012) with high sensitivity. Well mapped reads were counted using SAMTools v1.3 ). The matrix with read counts per sample was imported into Rstudio. Library sizes were normalized using the cumulative scale sum strategy (CSS) with metagenomeSeq (Paulson et al., 2013). Counts were then normalized by genome length by dividing the total read length by the MAG genome size (bp). Coverages below 1% were discarded as inconclusive and assigned to zero (Levy-Booth et al., 2021). Fractions were rounded up to generate a list of normalized natural counts. These counts were used to reconstruct MAG community and calculate average MAG relative abundance per plot.
Rarefaction curves for each sample were calculated using the package Ranacapa (Kandlikar et al., 2018). For both MAG-and 16S rRNA gene-based community assessments, non-metric multidimensional scaling (NMDS) and correlation analysis (CA) were performed with Bray-Curtis calculated distances using the package PhyloSeq (McMurdie and Holmes, 2013). Differences in community composition across plots were accessed by performing PERMANOVA with 999 permutations and subsequent multivariate homogeneity of groups dispersions (Betadisper) using the package Vegan (Oksanen et al., 2013). The same package was used to perform vector fitting and surface fitting of Cullars Rotation soil parameters with significant correlation to community changes (p < 0.05).

Bulk soil bacterial community reconstruction and distribution across Cullars rotation plots
Total DNA was extracted from bulk soil samples which were the same used for CPR cell enrichment. DNA from plots 03, 08, 10 and 11 were used for 16S rRNA gene PCR amplification. The V4 region of the 16S rRNA gene was amplified and paired-end sequenced. Paired-end reads were processed using FLASH (Magoc and Salzberg, 2011) to produce raw tags, which were quality filtered following the QIIME2 protocol (Caporaso et al., 2010). The clean high-quality tags were compared against the SILVA database and chimeras were removed using UCHIME (Edgar et al., 2011). Sequences with >97% similarity were clustered into the same operational taxonomic unit (OTU) using Uparse. Taxonomic annotation of OTU representative sequences was carried out against the SILVA SSU database using Mothur (Schloss et al., 2009).
OTU absolute abundance was normalized by rarefaction, and the subsequent counts were used for alpha and beta diversity analyses. The community comparisons between plots and ordinations were constructed using the same programs outlines above for the MAG communities. Linear discriminant analysis Effect Size (LEfSE) analysis was conducted using the LEfSe software (Segata et al., 2011).

Network analysis
Normalized counts from the 10 most abundant phyla in the16S and MAG abundance data were combined to access the possible relationships between the relative abundance of CPR MAGs and 16S ribotypes from soil bacteria. The resulting count table was used to build the network using the package SPIEC-EASI (Kurtz et al., 2015). The graphs were drawn using the package iGraph (Csardi and

Recovering ultra-small bacteria genomes from Cullars rotation soil
A previous study of a Cullars Rotation soil metagenome revealed that CPR bacteria were present in this soil in extremely low abundance, but were accessible using a direct cloning and sequencing approach that avoided PCR and 16S rRNA geneassociated biases (Santana-Pereira et al., 2020). Therefore, for each of three samples from the 14 plots at Cullars Rotation, ultra-small cells <0.45 μm were enriched from soil samples to directly sequence and obtain CPR genome bins. From the 124 non-replicated bins, a total of 42 genomic bins were curated ranging from 400 Kb to 3.5 Mb (average 1.17 Mb). We identified 27 MAGs with acceptable quality, which was defined as completeness below 65%, contamination over 10% or strain heterogeneity over 10% (Supplementary Figure S1). These MAGs had an average completeness of 86.8%, with 5 bins later identified as CPR with 100% completeness, and an average genome size of 1.13 Mb in accordance with the reduced genome size characteristic of CPR bacteria. A noticeable exception was MAG DT 16 with a genome size of 3.55 Mb and completeness of 78.6%; however, this relatively larger genome was affiliated with the phylum Gemmatimonadetes which is not known to have a small size but has been previously observed to be capable of transmission through a small sized filter (Portillo et al., 2013).

Taxonomical classification of Cullars MAGs
CPR classification has experienced several revisions (Rinke et al., 2013;Brown et al., 2015;Parks et al., 2017Parks et al., , 2018, highlighting the challenges associated with conducting phylogenetic analyses of uncultured microorganisms (Konstantinidis et al., 2017). Initial studies on CPR taxonomy, based on up ta few thousands of genomes, categorized it as a monophyletic group encompassing several individual candidate phyla, such as Saccharibacteria, and the superphyla Parcubacteria and Microgenomates, each containing numerous candidate phyla (Brown et al., 2015;Anantharaman et al., 2016;Hug et al., 2016). This classification is most congruent to NCBI taxonomy and thus popularized. With the increasing availability of MAGs and genomes, much more comprehensive studies using from 8 to more than 90 thousand genomes made widespread rank normalization and standardization possible, settling on Patescibacteria/CPR as a single phylum (Parks et al., 2017(Parks et al., , 2018. This framework was used to develop the Genome Taxonomy DataBase (GTDB) (Chaumeil et al., 2019;Parks et al., 2020), which proposed new nomenclatures and taxonomic ranks for CPR. Thus the previous candidate phyla Saccharibacteria, Parcubacteria and ABY1, correspond to the classes Saccharimonadia, Paceibacteria and ABY1, respectively (Supplementary Table S2). Due to their widespread use and congruence with popular NCBI nomenclature, the names CPR, Saccharibacteria and Parcubacteria will be favored throughout the paper.
A collection of 13 phylogenetically informative marker genes (Supplementary Table S1) and 16S rRNA genes were extracted from Cullars MAGs and were independently compared to the NCBI GenBank nr/nt and SILVA databases, respectively, to form a consensus homology classification of the MAGs. All curated MAGs were classified, with one representative MAG recovered from each of the phyla Actinobacteria, Chlamydiae, Dependentiae and Gemmatimonadetes, and the remaining belonging to the CPR group. Within CPR, there were representatives of the Parcubacteria (16), Saccharibacteria (10) and ABY1 (1) groups ( Figure 2). Additionally, CPR MAGs were classified using the GTDB classifier, showing high congruence with the previous classification, since Parcubacteria, Saccharibacteria and ABY1 MAGs, were classified as Paceibacteria, Saccharimonadia and ABY1, respectively. Moreover, GTDB provided finer classification up to family level. Interestingly, a number of MAGs associated with underrepresented taxa in GTDB were recovered from Cullars Soil, such as representatives of families VEQN01 and CAYRIW01, which have only 3 and 1 reference genomes, respectively.
Given the high degree of diversity among members of the CPR group, the candidate CPR MAGs were further classified by constructing a concatenated maximum likelihood tree using 13 phylogenetic markers and 920 reference genomes, allowing the selection of closest relatives for metabolic potential comparisons ( Figure 2). The tree confirmed the previous taxonomical assignments. Cullars CPR MAGs affiliated with the Parcubacteria/Paceibacteria class, were all from the order UBA9983_A and distributed across 10 families that correlate to the broad groups Nomurabacteria, Taylorbacteria, Adlerbacteria and Kaiserbacteria in NCBI taxonomy. Additionally, two MAGs (DT 10 and DT 38) clustered with an unclassified CPR MAG to form a deeper branch within Saccharibacteria (Figure 2). These MAGs were classified by GTDB as belonging to the order UBA4664, which carries only 9 references, corresponding to 1.25% of all Saccharibacteria genomes in the database. Indeed, this low representation, whether due to sampling or environmental rarity, might explain the observed deeper branching, although it might be mitigated with the insertion of more reference genomes. The ABY1 group MAG was further classified into the Uhrbacteria group.

Metabolic potential of soil dwelling CPR
The MAGs recovered from Cullars Rotation soil represent three distinct CPR classes including the Saccharibacteria (10), Parcubacteria (16), and ABY1 (1). These MAG genomes (Figure 3) were compared to a curated collection of 124 genome references from NCBI (Saccharibacteria, 43; Parcubacteria, 49; ABY1, 32) to assess their metabolic potential utilizing the metabolic profiling tool DRAM (Supplementary Figure S2). ABY1, Saccharibacteria, and Parcubacteria MAGs carried partially complete glycolysis, pentose phosphate and reductive pentose phosphate pathways, as consistently observed among these classes of CPR. Additionally, they were found to have nearly complete F-type ATPase. All MAGs were lacking a complete tricarboxylic acid cycle (TCA), subunits of nicotinamide adenine dinucleotide (NADH) Frontiers in Microbiology 06 frontiersin.org dehydrogenase and electron transport chain (ETC), which indicates an anaerobic and fermentative metabolism, although the presence of complete or nearly complete F-type ATPases was widespread. Interestingly, Parcubacteria, namely from the family UBA2103, and Saccharibacteria from Cullars were more likely to carry mostly complete operons coding for cytochrome o ubiquitol oxidase. Cytochrome o functions by catalyzing the oxidation of ubiquinol (CoQH 2 ) into ubiquinone (CoQ), coupled with the reduction of O 2 into water, supplying energy to pump H+ into the outer membrane contributing to the formation of a proton motive force (Aussel et al., 2014). Interestingly, it was more common among the soil CPR than the references. The CAZyme database and its enzyme classes were utilized to understand Cullars MAGs carbohydrate metabolic potential, namely glycoside hydrolases (GH, carbohydrate degradation), glycotransferases (GT, carbohydrate biosynthesis), carbohydrate esterases (CE, hydrolysis of carbohydrate esters), carbohydratebinding modules (CBM, adhesion of carbohydrates), and auxiliary activities (AA, redox coenzymes for lignocellulose degradation) (Supplementary Figure S4). Most Cullars MAGs contained genes for enzymes associated with starch, cellulose/hemicellulose, and chitin degradation with the top hits among the Saccharibacteria and Parcubacteria, but not present in the Cullars ABY1 MAG (Figure 4). Interestingly, only Cullars Saccharibacteria carried the potential for lignin degradation. This continues to support soil CPR as containing the necessary enzymes for degradation of complex plant carbohydrates and soil necromass including bacterial and fungal cell walls. ABY1 and Saccharibacteria carbon substrate metabolic profiles were not found to be statistically different from that of the reference genomes available on NCBI (p > 0.05). However, Cullars Parcubacteria MAGs did demonstrate a strongly statistically difference from their references (p < 0.001). Parcubacteria was the most represented CPR in our MAGs and particular samples (i.e., DT.28) contained several hits for particular enzyme classes for peptidoglycan and chitin degradation, suggesting these soil microorganisms may be particularly adept for scavenging dead or decaying soil microflora.
Cullars CPR MAGs did not demonstrate any ability to participate in nitrogen or methane cycling. Select Saccharibacteria were predicted to have some capacity for dissimilatory and assimilatory sulfate reduction. Cullars MAGs were predicted to degrade organic forms of nitrogen, particularly, the amino acid methionine in the cysteine biosynthesis pathway. Likewise, Cullars CPR had many predicted peptidases, indicating the ability to degrade peptides from the soil environment.

Distribution of the ultra-small bacteria in Cullars rotation soil
Sampling the Cullars Rotation soil fertility plots offers a unique opportunity to assess the influence of nutrient availability and soil parameters on the soil CPR community, due to its century-old controlled nutrient deficiency experiments (Mitchell et al., 2011) (Figure 1). To reconstruct the MAG community in each plot, reads from each sample Classification of all Cullars MAGs and the proportion of each phyla. *Total MAGs does not include MAGs filtered out by the completeness, strain heterogeneity and contamination thresholds. Concatenated maximum likelihood tree of 13 phylogenetic marker genes from Cullars CPR and 920 reference genomes allowed further classification of CPR MAGs. MAGs were classified as Saccharibacteria (Saccharimonadia), ABY1 and Parcubacteria (Paceibacteria). The Parcubacteria group is expanded in the tree due to its intrinsic diversity. Zoomed in regions offers context to MAGs placement. Red dots denote Cullars MAGs.
Frontiers in Microbiology 07 frontiersin.org were mapped back against the curated MAGs and normalized to account for variations in library size and genome length (Figure 1). The normalized, filtered counts were used to reconstruct the communities in each of the plots in Cullars Rotation ( Figure 5A). Additionally, soil nutrient content for each plot was also measured creating a profile that could be correlated with changes in community structure ( Figure 5B). MAG communities are composed predominantly of Saccharibacteria, followed by Gemmatimonadetes and Dependentiae, even though the latter two have only one representative MAG each.
Despite having the higher and most diverse number of representatives, Parcubacteria accounted for a significantly smaller proportion of the community, indicating that taxa dominance was observed in the sampled metagenomes. Interestingly, there was an observed spike in Actinobacteria abundance in Plot 08, which received no pH correction treatment and had a significantly lower pH than other treatment groups.
MAG abundance within the different CPR taxa varied among the different experimental treatment groups (Supplementary Figure S5). In particular, Plot 08 (no pH correction) showed the greatest shift in Energetic metabolic potential of Cullars MAGs. % Complete: level of completeness of the pathway, measured as presence of genes coding for necessary enzymes to carry out each step.
Frontiers in Microbiology 08 frontiersin.org community structure, with 78% of the relative abundance corresponding to one Actinobacteria and two Saccharibacteria bins, which were dominant exclusively in this plot, while also having the highest number of recovered Parcubacteria genomes. Dependentiae and Gemmatimonadetes genomes were also recovered across all plots, with the later peaking in abundance under increased phosphorous amendment (Plot 05) and dropping to marginal abundance when there was no soil pH correction (Plot 08) indicating a susceptibility to very acidic soils. Otherwise, the abundance of CPR genomes was not significantly different among the different Cullars Rotation plots. NMDS ordination was used to visualize communities, dissimilarities in relation to fertilization treatment. To further explore the role of soil parameters in community compositions, the measurements of soil macro and micro-nutrients were modeled to the community distance matrix ( Figure 5B). Community profiles did vary significantly across plots (PERMANOVA, p = 0.001 and Betadisper, p = 0.8537) with the soil plots with an acidic pH having the most dramatic effect in clustering (p = 0.015). The lack of pH correction coupled with the addition of fertilization treatment resulted in the most acidic pH across all plots for Plot 08 (pH = 4.08 ± 0.08), surpassing even Plot C which received no amendment nor fertilization (pH = 4.58 ± 0.04); however, pH alone could not account for the observed differences (PERMANOVA, p = 0.199), with similar trends not observed for Plot C, which also has a very acidic soil. The environmental fit analysis reveals that the factors with the strongest correlation to the changes in the community were aluminum (Al) and iron (Fe) concentrations (p = 0.001 and p = 0.0001, respectively).

Overall microbiota community and network analysis
Next, the relationship between the MAG community and overall bacterial community was explored. 16S rRNA gene amplicon data were used to reconstruct the overall bacterial community, recovering representatives of 40 phyla. The overall community was dominated by Proteobacteria, Actinobacteria, Acidobacteria, Bacteroidota, and Firmicutes ( Figure 5C). Plot 08 had lower alpha diversity (Supplementary Figure S6) and an overrepresentation of the phylum Chloroflexi. Communities between plots were significantly different (p = 0.001), with Plot 08 also clustering away from other plots in PCA ordinations ( Figure 5D). Similarly to the MAG community shifts, an acidic pH was correlated with significant 16S community differences (p = 0.004); however, the only soil parameter with a significant correlation to the community was Al concentration (p = 0.001).
Linear discriminant analysis effect size (LEfSe) was used to identify taxa highly correlated to community shifts in each plot that can act as biomarkers. 18 OTUs were marked as Plot 08 biomarkers at varying taxonomical levels, with the strongest signal belonging to the phylum Chloroflexi, class Ktedonobacteria (Supplementary Figure S7).
The potential relationship between Cullars' MAGs and members of the soil microbiome was explored by constructing networks between Cullars' MAGS and soil phyla ( Figure 6). The MAGs were correlated with the occurrence of diverse phyla, albeit the weight of the edges was low. No particularly strong relationship was identified between the MAGs or any particular taxa in the overall soil microbiome, even when focusing on OTUs and MAGs distinctively Carbohydrate degradation potential of Cullars CPR MAGs and their references. Hits correspond to GH genes identified against the CAZy database, which was also used to obtain substract information. Circles represent the averaged hits for references and Cullars MAGs per substrate category.
Frontiers in Microbiology 09 frontiersin.org associated with Plot 08, such as MAGs DT35 and DT24, or Plot 08-specific biomarkers such as Chloroflexi or Acidobacteria taxa.

Discussion
Most studies of ultra-small bacteria have centered around the CPR phylum due to their unusual metabolic potential (Jaffe et al., 2020;Chiriac et al., 2022) and astounding phylogenetic diversity He et al., 2021). Despite their notable presence in groundwater, aquifers and associated sediment (Luef et al., 2015;Anantharaman et al., 2016;Chiriac et al., 2022), it was indicated that aquatic CPR are mobilized from soils (Herrmann et al., 2019). Hence soil became a focus of CPR study, supported by the reconstruction of CPR genomes from Amazonian (Kroeger et al., 2018) and Antarctic soils (Woodcroft et al., 2018), via deep-sequencing.
Following the observation of CPR in Cullars rotation soil in very low abundances (Santana-Pereira et al., 2020), we employed an enrichment protocol targeting the small fraction (< 0.45 μm) of the microbiota for the recovery of soil CPR with much modest sequencing resources then the otherwise terabyte efforts previously reported Woodcroft et al., 2018), although consideration of possible biases and incomplete sampling must be taken into account. This approach allowed the study of 14 different cultivated plots under different fertilization regimens, offering a unique opportunity to observe soil CPR communities under controlled soil nutrient variation. This study demonstrated that CPR are widespread in Cullars Rotation soils, and potentially in the soil environment generally, albeit in low abundance. Genome reduction (<1.5Mbp) and sparse metabolisms were also observed in Cullars Rotation soil-derived CPR genomes, pointing toward symbiotic life-styles, as previously observed (Brown et al., 2015;Castelle et al., 2018;Herrmann et al., 2019).
Cullars CPR had similar metabolic restrictions as their aquatic counterparts: incomplete nucleotide, amino-acid and lipid metabolisms. The lack of an TCA cycle and ETC elements, suggest that soil CPR are mainly fermentative, akin to previously characterized CPR (Castelle et al., 2018). However, while those CPR were associated with anoxic environments, Cullars CPR were collected from an aerobic fraction of the topsoil (Castelle et al., 2017). Interestingly, soil Saccharibacteria and   (Brown et al., 2015). Indeed, Coenzyme Q 8 (CoQ 8 ) plays an important role in oxidative stress by neutralizing reactive oxygen species (ROS) (Aussel et al., 2014), therefore it is possible that the combined action of CoQ 8 and Cytochrome o confers aerotolerance to soil CPR by scavenging ROS and O 2 simultaneously via CoQ 8 redox cycling (ubiquinone ↔ ubiquinol). However, the simultaneous presence of Cytochrome o and ATPases, suggest that Cullars Saccharibacteria and UBA2103 (Parcubacteria/Paceibacteria) carry some capacity for respiration, which has been previously observed and hypothesized in other CPR recovered from soil (Nicolas et al., 2021). Although, no NADH or succinate dehydrogenases were identified in Cullars CPR, ROS scavenging via ubiquinol formation could be a source of electrons for Cytochrome o and proton motor force creation. It is noteworthy that most MAGs carried ATPases, even in the absence Cytochrome o, or any other ETC component, suggesting that they might be used in yet unknown ways. CPR bacteria have been predicted to ferment complex substrates (Kantor et al., 2013), and Cullars Saccharibacteria and Parcubacteria were particularly adept at degrading a variety of carbohydrates including plant-derived sugars (Starr et al., 2018;Taubert et al., 2019), suggesting their participation in plant necromass turn over and Carbon cycling in the soil. However, due to their patchy fermentation pathways, it is unclear if these organisms preferentially derive their energy from direct biomass degradation or from scavenging processed derivatives Figueroa-Gonzalez et al., 2020).
Despite different selective pressures in aquatic environments and soils, the sugar degradation profiles of Cullars CPR and references were surprisingly similar, with the retention of plant-matter degradation enzymes by the later, thus corroborating the observation that aquatic CPR are mobilized from soils (Herrmann et al., 2019). On the other hand, Cullars Parcubacteria displayed distinct enrichment of chitin and peptidoglycan degradation enzymes and cytochrome o complexes, suggesting soil specific adaptations toward scavenging dead or decaying soil microfauna under oxic conditions. While some CPR have been shown to live an endosymbiotic lifestyle (Gong et al., 2014;Bor et al., 2018), we did not find specific correlations between CPR MAGs and specific bacterial taxa, which points toward reliance on available bacterial cohorts in each plot for the supply of the needed building blocks , perhaps forming episymbiotic relationships with diverse hosts (He et al., 2021), instead of forming strict host-specific interactions. Unsurprisingly, the minimetagenome approach identified other symbiotic bacteria within these soils including environmental Chlamydia and Dependentiae (TM6), which are other examples of bacterial lifestyles that result in genome reduction (McCutcheon and Moran, 2011).
The Cullars Rotation soil fertility experiment offered a unique resource to observe how the CPR community responds to controlled nutrient deficiencies in soil. Although they were not likely directly used by CPR, they have great influence in plant matter input and microbiota activity (Zhao et al., 2015). Yet, nutrient availability changes did not influence CPR communities, indicating that their particular niches are not disturbed by the observed fluctuations in the overall microbiota composition and activity, nor by nutrient bioavailability and crop growth. Moreover the lack of taxa-specific (A) Network analysis filtered by all the OTUS belonging to the main phyla from the overall microbiota that interacts positively with the Cullars MAGs. Line thickness corresponds to correlation weight (B) Network analysis filtered by all the OTUS belonging to the plot biomarkers (defined by LEfS analysis) that interacts positively with the Cullars MAGs. Diamond shaped nodes: Prominent MAGs and biomarkers in Plot 08. Microbiology  11 frontiersin.org trends in CPR distribution across the plots and the ability of all Cullars CPR to withstand and grow under diverse conditions suggests that soil CPR have dynamic ecological roles (Jaffe et al., 2020). CPR communities, and the overall microbiota, were the most affected by a sharp decrease in pH caused by lack of pH correction resulting in heavy metal solubilization and potential toxicity, leading to decreased crop yields (Mitchell et al., 2005), microbiota diversity and activity (Zhao et al., 2015). Despite that, some Cullars CPR, especially two particular Saccharibacteria MAGs, thrived and outcompeted others, dominating the community. Although CPR have been observed in extreme conditions, usually Microgenomates and Parcubacteria (Harris et al., 2004;Gagen et al., 2020;Hosokawa et al., 2021), Saccharibacteria are not commonly associated with these environments.

Frontiers in
More extensive sampling could have the potential to uncover more CPR diversity, since rarefaction curves show that most of the plots were under-sampled. Indeed, Cullars CPR sequences previously captured in a soil metagenomic library (Santana-Pereira et al., 2020) were not observed in this study. More stringent filtering (Nicolas et al., 2021) or increased sequencing depth could be implemented in future studies to better MAG construction and thus diversity recovery.
Nevertheless, these findings offer valuable insight into the soil dwelling CPR community and their possible ecological niches. We have shown that CPR are rare but widespread in soils, despite differences in nutrient availability and microbiota composition. Although soil CPR share many of the metabolic limitations of groundwater or sediment-associated CPR (Jaffe et al., 2020;Lemos et al., 2020), there is a notable bias toward degradation of plantderived matter and even the development of specific adaptations of oxic environments, including the potential for a unique respiration pathway coupled with ROS scavenging. By observing the CPR communities in parallel, we were able to verify that Cullars CPR do not form strict symbiotic relationships, nor depend on specific hosts for survival, behaving like scavenging ectosymbionts. Cullars CPR appear to live in the fringes of the microbiota undisturbed due to their low abundances and resourceful metabolisms, allowing them to thrive under adverse conditions. Future studies should explore the contributions of soil CPR bacteria to important biogeochemical processes and especially plant-microbial interactions.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ncbi.nlm.nih.gov/genbank/, BioProject accession number PRJNA901447.