Oligotyping reveals stronger relationship of organic soil bacterial community structure with N-amendments and soil chemistry in comparison to that of mineral soil at Harvard Forest, MA, USA

The impact of chronic nitrogen amendments on bacterial communities was evaluated at Harvard Forest, Petersham, MA, USA. Thirty soil samples (3 treatments × 2 soil horizons × 5 subplots) were collected in 2009 from untreated (control), low nitrogen-amended (LN; 50 kg NH4NO3 ha-1 yr-1) and high nitrogen-amended (HN; 150 kg NH4NO3 ha-1 yr-1) plots. PCR-amplified partial 16S rRNA gene sequences made from soil DNA were subjected to pyrosequencing (Turlapati et al., 2013) and analyses using oligotyping. The parameters M (the minimum count of the most abundant unique sequence in an oligotype) and s (the minimum number of samples in which an oligotype is expected to be present) had to be optimized for forest soils because of high diversity and the presence of rare organisms. Comparative analyses of the pyrosequencing data by oligotyping and operational taxonomic unit clustering tools indicated that the former yields more refined units of taxonomy with sequence similarity of ≥99.5%. Sequences affiliated with four new phyla and 73 genera were identified in the present study as compared to 27 genera reported earlier from the same data (Turlapati et al., 2013). Significant rearrangements in the bacterial community structure were observed with N-amendments revealing the presence of additional genera in N-amended plots with the absence of some that were present in the control plots. Permutational MANOVA analyses indicated significant variation associated with soil horizon and N treatment for a majority of the phyla. In most cases soil horizon partitioned more variation relative to treatment and treatment effects were more evident for the organic (Org) horizon. Mantel test results for Org soil showed significant positive correlations between bacterial communities and most soil parameters including NH4 and NO3. In mineral soil, correlations were seen only with pH, NH4, and NO3. Regardless of the pipeline used, a major hindrance for such a study remains to be the lack of reference databases for forest soils.


INTRODUCTION
Soils harbor an immense diversity of bacteria (Torsvik et al., 2002;Trevors, 2010 and references therein), most of it is hidden from experimental analyses (Wall et al., 2010). A vast majority of soil microbes are recalcitrant to culture methods thus increasing the complexity of any study to expose this concealed diversity (Sait et al., 2002;Nunes da Rocha et al., 2009;Vartoukian et al., 2010;Lombard et al., 2011). Recently developed molecular tools have enabled us to analyze the expanse of variation in bacterial populations using culture-independent methods such as polymerase chain reaction (PCR) amplification of partial or Abbreviations: Con, control; LN, low nitrogen; HF, Harvard Forest; HN, high nitrogen; Min, mineral; Org, organic; CT, confidence threshold. full-length genes of 16S rRNA, and their in-depth sequencing (Janssen, 2006;Větrovský and Baldrian, 2013). Next generation sequencing approaches (e.g., pyrosequencing and Illumina technology) generate data that are several orders of magnitude superior than traditional sequencing methods; still they have limited ability for the assessment of the total microbial diversity in a soil sample, albeit the taxonomic richness (Margulies et al., 2005;Huse et al., 2009;Gloor et al., 2010). This is primarily due to the lack of reference genome libraries even for the dominant bacterial species in forest soil ecosystems (Howe et al., 2014).
In nature, microbes are vital contributors to biogeochemical transformations and hence examining their response to anthropomorphic activities over long periods is critical for www.frontiersin.org understanding ecological processes (Falkowski et al., 2008). Several studies have been conducted to reveal the extent of diversity of the soil bacterial and fungal communities being influenced by various factors including aboveground plant populations (Carney and Matson, 2006;Uroz et al., 2010;McGuire et al., 2012), soil type (Roesch et al., 2007;Lauber et al., 2008), soil pH (Fierer and Jackson, 2006;Lauber et al., 2009;Rousk et al., 2010), and differences in geographic location (Fulthorpe et al., 2008;Langenfeld et al., 2013;Bischoff et al., 2014). It is evident that soil microbial communities are influenced by numerous human activities, particularly land management practices, including long term nitrogen (N) fertilization of both agricultural and forest soils (Wu et al., 2008(Wu et al., , 2011Hallin et al., 2009;Ramirez et al., 2010;Fierer et al., 2012;Sridevi et al., 2012;Coolon et al., 2013;Turlapati et al., 2013). Wagg et al. (2014) have recently stressed the importance of changes in soil microbial diversity on nutrient cycling at several sites.
During the 1990s, N added to the atmosphere through human activity was much higher (160 Tg Y −1 ) than through natural biological fixation processes (110 Tg Y −1 ) (Gruber and Galloway, 2008). These authors suggested that increased N has multiple effects on cycling of other elements in the environment including carbon (C), leading to global warming. Acidification due to N saturation causes forest soils to become deficient in important labile pools of nutrients, particularly Ca 2+ and Mg 2+ (Currie et al., 1999). Calcium deficiency in the soil is known to predispose plants to disease and pathogen infection, thus contributing to decline in forest productivity as seen in red spruce and sugar maple stands in the Northeastern US (Shortle and Smith, 1988;Long et al., 1997;Minocha et al., 2010;Schaberg et al., 2011). These aboveground changes are often accompanied by belowground changes in soil chemistry that alter the microbiome, which in turn impacts the biogeochemical cycling of essential nutrients (N, C, and P). Recent studies have reported reduced microbial biomass and activity with N fertilization of forest soils (Treseder, 2008;Janssens et al., 2010). Other reports have indicated either an increase (Cusack et al., 2010), or a neutral response  in microbial biomass with N fertilization. The variability of these findings suggests that the effects of N addition on microbial populations may be site-specific.
At the HF Long-Term Ecological Research site located in Petersham, MA, USA (HF) 1 , experimental plots were set up in 1989 to study the long-term effects of N addition on above-and belowground communities . Past studies from this site have shown negative shifts in the ratio of fungal: bacterial biomass, microbial biomass C, and substrate-induced respiration rates in response to N additions (Bowden et al., 2004;Frey et al., 2004;Wallenstein et al., 2006;Ramirez et al., 2012). Soil from N treatment plots at HF were reported to accumulate more C due to a decrease in decomposition rate (Frey et al., 2014). Restriction fragment length polymorphism (RFLP) and PCR profiles for DNA extracted from N-treated soil samples exhibited altered functional N-cycle gene composition (Compton et al., 2004). Using pyrosequencing of the PCR-amplified 16S rRNA genes from the soil DNA, our group showed that N addition caused profound 1 http://harvardforest.fas.harvard.edu/research/LTER rearrangements in the structure of bacterial communities at the HF site (Turlapati et al., 2013). Major changes were recorded in the community structure of Acidobacteria, α and β subclasses of Proteobacteria, and Verrucomicrobia were observed. These conclusions were derived using the UCLUST tool in Quantitative Insights into Microbial Ecology (QIIME) toolkit using the latest version available (1.4.0) at that time (Caporaso et al., 2010b;Edgar, 2010) to cluster the sequences into operational taxonomic units (OTUs) at 97% sequence identity. It was observed that 2% of the total OTUs in this dataset contained ≥50% of the total sequences; on the other hand, up to 80% of total OTUs were highly diverse and contained ∼10% of the total sequences.
It has been suggested that there is substantial phylogenetic diversity in marine and soil environments attributable to the occurrence of rare bacterial populations (Sogin et al., 2006;Lynch et al., 2012). Although the diversity of HF soil microbes could be estimated to some extent in our previous study, it was not possible to examine the sequence diversity within each abundant OTU since classification was assigned only to the OTU representative sequences (Turlapati et al., 2013). This is important because OTU clustering (often done at 97% sequence identity) is less powerful in identifying phylogenetically distinct organisms that differ by a small number of nucleotides (Eren et al., 2013). Oligotyping is a recently developed computational tool that allows users to choose entropy components ('supervised tool') that have high variability in order to resolve underlying diversity among sequences within each OTU or taxonomic group (Eren et al., 2013).
With the aim of analyzing the effects of prolonged N treatment on individual bacterial groups, and identifying additional genera/families whose presence and/or abundance may be correlated with alterations in soil factors, we subjected our pyrosequencing dataset (Turlapati et al., 2013) to the oligotyping pipeline and taxonomy was assigned using the recently updated RDP database (Cole et al., 2013). The specific objectives of the study were to: (1) demonstrate the applicability of the oligotyping pipeline for forest soil datasets; (2) study the effects of N-amendment on individual bacterial taxa and compare these with previous findings based on OTU clustering; and (3) evaluate the effects of soil chemistry on bacterial communities of Org and Min horizons.

SITE DESCRIPTION AND SOIL SAMPLE COLLECTION
The study site is a mixed hardwood stand naturally regenerated after being clear-cut in 1945 and is located on Prospect Hill at the HF 2 , Petersham, MA, USA. The stand is comprised of predominantly red oak (Quercus rubra L) and black oak (Q. vetulina Lam.), mixed with red maple (Acer rubrum L.), American beech (Fagus grandifolia Ehrh.) and black birch (Betula lenta L.). Soil at this site is mostly stony to sandy loam formed from glacial till. For more details on site description including vegetation, climate, site topography, and N amendments refer to Aber et al. (1993) and Magill et al. (2004).
As described in our previous report (Turlapati et al., 2013), three 30 m × 30 m treatment plots (further subdivided into 36 subplots; each measuring 5 m × 5 m) were used for sample collection.
These plots were established in May 1988 as part of a long-term study on chronic N effects on ecosystem function. One plot served as a Con and 2 other plots were treated with NH 4 NO 3 ; low N (LN; treated with 50 kg N ha −1 yr −1 ), and high N (HN; treated with 150 kg N ha −1 yr −1 ). Ammonium nitrate solution was applied by a backpack sprayer yearly in six equal doses at 4-week intervals from May to September. Soil samples were collected from five randomly selected subplots within each treatment plot in September 2009 using a soil corer (7.5 cm diameter). The upper Org layer (Org, average 8 cm) was separated from the lower Min layer. In a few cases where the Org horizon was 10-12 cm deep, deeper coring was needed to get to the Min soil. Thirty samples (5 cores per plot × 2 horizons × 3 treatments) were collected in polyethylene bags and brought to the laboratory on ice. Samples were sieved (2 mm pore size) to remove roots, debris and stones, and then stored at −20 • C for further use.

SOIL CHEMICAL ANALYSES
Air-dried soil samples (20-40 g) were sent to the Soil Testing Service Laboratory at the University of Maine, Orono, ME, USA 3 for analyses. Nitrate and NH 4 -N were extracted in potassium chloride and determined colorimetrically by Ion Analyzer in 2012. The rest of the analyses were carried out in 2010 as described in Turlapati et al. (2013). The methods for the extraction of polyamines and amino acids were described in our previous publication (Frey et al., 2014).

DNA ISOLATION, PCR, PYROSEQUENCING, AND DATA QUALITY FILTERING
As previously described in Turlapati et al. (2013), PowerSoil® DNA isolation kit (MO-BIO Laboratories, Carlsbad, CA, USA) was used to isolate genomic DNA from 0.5 g of soil samples. Universal primers (F968 5 AA CGC GAA GAACCT TAC3 and R1401-1a 5 CGG TGT GTA CAA GGC CCG GGA ACG3 ) as described in Brons and van Elsas (2008) with 30 barcodes (10 bp, one for each soil sample) were used for PCR to generate ∼433-bp amplicons corresponding to the V6-V8 hypervariable region of the bacterial 16S rRNA encoding gene. The PCR amplifications were conducted in triplicate using Phusion® Taq Master Mix (New England Biolabs, Ipswich, MA, USA) with 50 ng of template DNA in a final volume of 50 μL. The reactions were performed in a PTC-100® Programmable Thermal Cycler (MJ Research, Inc., Waltham, MA, USA) with the following conditions: an initial denaturation at 95 • C for 5 min, followed by 20 cycles of denaturation at 95 • C for 30 s, annealing at 61 • C for 30 s, and extension at 72 • C for 45 s, with a final extension at 72 • C for 10 min. The triplicate reaction products (amplicons) from each soil sample were pooled for sequencing. DNA purification kit (Zymo Research, Irvine, CA, USA) was used to purify the pooled PCR products which were then subjected to further cleaning via the Agencourt® AMPure® XP Bead Purification method (Agencourt Bioscience Corporation, Beverly, MA, USA) to remove fragments <100 bp. The quality of the PCR products were evaluated in an Agilent 2100 Bioanalyzer using the DNA 1000 LabChip (Agilent Technologies, Palo Alto, CA, USA). The 30 bar-coded samples were pooled in equimolar 3 http://anlab.umesci.maine.edu quantities (Margulies et al., 2005) in order to process for sequencing (Roche 454 GS-FLX Titanium System) at the University of Illinois, USA 4 in a full picotiter plate.

OLIGOTYPING ANALYSIS
The forward primer (549,500 sequences) pyrosequencing data were quality filtered in QIIME (version 1.4.0) with default settings for most steps as described in Caporaso et al. (2010b). Chimeric sequences were also removed using Chimera Slayer in QIIME (Table S1). After removal of low quality and chimeric sequences, the remaining data were used as an input taxonomic assignment using the Ribosome Database Project (RDP) classifier version 2.7 5 . In the first step, IDs of the sequences corresponding to each phylum were extracted individually from the dataset using CT ≥0.8 with Perl script (Supplementary Material -1). The sequences corresponding only to these IDs (selected at CT ≥0.8) were then extracted out by using the command filter.fasta.py (available in QIIME) and were further processed for oligotyping analyses. Selected sequences corresponding to a phylum or a subgroup/class were aligned using PyNAST (Caporaso et al., 2010a). PyNAST enables the alignment of sequences against a template database such as Greengenes (McDonald et al., 2012) in QIIME. Before beginning oligotyping, the uninformative gaps in the alignment were removed along with 10-15 nucleotides at the end of each read, which were trimmed to attain reads of similar lengths.
Oligotyping was conducted individually for each phylum. The only exceptions were Proteobacteria and Acidobacteria where oligotyping had to be conducted individually for each class or subgroup because among some of the subgroups there were several-fold differences in the number of sequences (e.g., within Acidobacteria, subgroup Gp1 accounted for 151,462 sequences and Gp4 had only 715 sequences); these required different M values (the minimum substantive abundance of the most abundant unique sequence in an oligotype) for analyses. To begin with all sequences were 430 bp in size; after alignment and filtering the gaps, the sizes of the aligned reads were different for different phyla ( Figure S1).
Entropy and initial oligotyping analyses were conducted according to Eren et al. (2013). The oligotyping method utilizes Shannon (1948) entropy for detecting the amount of diversity associated with each nucleotide position and provides a way of identifying positions associated with greater variability. The entropy peaks for nucleotide positions ranged from 0 to slightly >2 for most of the datasets under consideration with the exception of the phylum Nitrospira ( Figure S1). We observed that a higher number of peaks with entropy values ≥1.0 resulted in greater bacterial diversity. In the first step, oligotypes were generated by using the first position with the highest entropy value. To decompose these oligotypes further, supervised oligotyping steps (user-defined nucleotide selection for decomposing entropy) were used. On the average 50 positions were chosen to decompose entropy for most datasets. Oligotyping was continued until no peaks were left unresolved that could further decompose the oligotype. Peaks with entropy values of <0.2 often did not yield additional oligotypes and were considered background noise. Details for the above steps are described in Eren et al. (2013).
For supervised oligotyping, M values had to be modified for forest soil samples from those suggested for other ecosystems by Eren et al. (2013). Since the total number of sequences varied across different bacterial phyla, the M values varied accordingly for each analysis as shown in Table S2. This value was usually lower (2-5) for relatively smaller datasets in the range of 200-5000 reads. In order to correct for technical errors due to pyrosequencing, the value for parameter 's' ('s' is the minimum number of samples in which an oligotype is expected to be present) was set to two with the assumption that any sequence that occurs in two biological samples represents an element of a microbiome and is not an error. Huse et al. (2010) found that even with deep sequencing, OTUs with one sequence rarely occurred in other replicates and the chance of a spurious OTU occurring in two environmental samples is not realistic which supports our assumption. To capture the rare biosphere in soil samples, no values were assigned to parameter 'a' (the minimum percent abundance of an oligotype in at least one sample) and parameter 'A' (the minimum actual abundance of an oligotype across all samples). Oligotype representative sequences (the most abundance sequence within an oligotype) were classified at CT ≥0.8 (a generally accepted threshold of 80% for assigning taxonomy) using the RDP classifier tool version 2.7 6 . For determining percent alignment scores, the sequences corresponding to individual oligotypes were extracted from the oligo-representatives directory and aligned using the ClustalW2 tool 7 .
In one small comparative study with a subset of data, oligotype representative sequences were classified at CT ≥0.5 as well as ≥0.8 using the RDP classifier to determine if additional genera could be identified using a lower CT value.
The oligotype representative sequences have been deposited in the NCBI short read archive. The accession numbers are presented in Table S1.

OTU CLUSTERING AND TAXONOMIC ANALYSES
To compare the oligotyping method with OTU clustering, qualityfiltered reads of four phyla (Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria) were selected and clustered into OTUs using UCLUST (Edgar, 2010) set at a 97% identity threshold in QIIME (version 1.8.0). The OTU representative sequences were picked in QIIME based on the most abundant sequence in each OTU. Similarly, the oligotyping method also assumes that the most abundant unique sequence is the oligotype representative sequence. For each dataset, sequences were filtered for minimum abundance (n size) for each OTU using the same value that was used for M in the corresponding oligotyping analyses. In addition, in order to match the parameter set for oligotyping (s = 2), OTUs that were not present in at least two samples were removed from the OTU table using python scripts (Supplementary Material -2). The OTU representative sequences were classified using RDP version 2.2 (currently used version) in QIIME.
In order to understand the reason for the difference in the genera identified by the two methods, we assigned taxonomy to OTU and oligo representative sequences using RDP in QIIME pipeline version 2.2 and online RDP classifier version 2.2.

STATISTICAL ANALYSES
SYSTAT (version 10.2, Systat Software, Inc., San Jose, CA, USA) was used for standard statistical tests, including paired t-tests and two-way analysis of variance (ANOVA), on the soil NH 4 and NO 3 data. Non-metric dimensional scaling (NMS) analyses were conducted using PC-ORD (version 6.03, MJM Software Design, Gleneden Beach, OR, USA). To normalize the data, digit one was added to all data before log10 transformation. Briefly, following settings were used for NMS: number of axes = 4, maximum number of iterations = 500; stability criterion (the standard deviation in stress over the last 10 iterations) = 10 −6 ; number of runs with real data = 100; and the number of runs with randomized data = 250. Random numbers were chosen as a source of starting ordinations. The tie handling was done by penalizing unequal ordination distance (Kruskal's secondary approach). The following were chosen as output options: varimax, randomization test, plot stress vs. iterations and calculate scores for OTUs by weighted averages. Dimensionality of solutions was selected for these analyses based on the assessment using a graph of stress as a function of dimensionality (scree plot). A Monte Carlo test was used to examine the stress and the strength of NMS results. Two-way permutational MANOVA was conducted using the Bray-Curtis distances to evaluate the effect of the horizons and the treatment, and the interaction between them. Mantel tests were conducted to evaluate the significance of correlations among Bray-Curtis distance scores and soil chemistry and soil Org N metabolites.

SOIL CHEMISTRY
While NH 4 concentration in the Org soil was significantly higher than that in the Min soil for all treatments (Table S3), there was no difference in NO 3 levels between the two horizons. Long-term N treatment did not significantly alter either NH 4 or NO 3 concentrations of either soil horizon (Table S3). Other details on soil analyses are described in Frey et al. (2004) and Turlapati et al. (2013).

PARAMETERS OF OLIGOTYPING ANALYSES
Soils are highly diverse and harbor an abundance of rare microbes. Rare microbes are more prone to primer-PCR amplification and sequencing biases thus making it harder to identify such individuals. In addition, it is well known that soil replicates have high microsite variability in chemistry as well as bacterial populations. This combination of rarity and microsite variability perhaps is the reason that the same microbes are not present in all replicate samples, and why the guidelines suggested for other biomes in Eren et al. (2013) did not work with the HF soil samples. Thus for analysis of these soils, the suggested guidelines had to be modified (personal communication with Dr. A. Murat Eren, Marine Biological Laboratories, Woods Hole, MA, USA).

Frontiers in Microbiology | Systems Microbiology
M is the value that is used to filter out potential noise in a sample. For this reason it is generally kept at a reasonably high number. It is likely that with forest soils, sequences representing rare taxa may be filtered out as noise due to their low abundance at high M values. In addition, high M values in such cases filtered out more than 50% of all sequences. The more diverse the phyla are (in terms of number of entropy components) more are the sequences that are filtered out with high M values (Table S2, e.g., see Bacteroidetes vs. Gp10). With the goal of retaining maximum sequences and diversity in terms of the number of oligotypes, several M values were tested for most phyla before final analyses. We tested different M values using αand β-Proteobacteria, two wellknown bacterial classes that are highly diverse and varied in the size of data for our soil samples with 38,858 and 3,340 sequences, respectively. The s value was set at two for both analyses. Lowering M values resulted in the identification of more genera at CT ≥0.8 in both classes ( Table 1). For example at CT of ≥0.8, the total number of genera identified with an M = 75 for α-Proteobacteria was 5, but with M = 15, 11 genera were identified. Similarly, for β-Proteobacteria, M = 25 identified only two genera while M = 3 identified eight genera ( Table 1) In α-Proteobacteria, genera such as Labrys and Acidisoma were identified with M = 15; they would be missed at higher M values. Similar results were obtained within β-Proteobacteria (Table 1). Therefore, final data analyses in this study were conducted using an M value that was based on two criteria, namely, the retention of maximum sequences, and maximum diversity (in terms of number of oligotypes) with taxonomic assignment at CT values no less than 0.8. Data presented here show that using high M values for groups that have low abundance would not identify genera with high confidence limits [e.g., at M = 2 Paenibacillus (Firmicutes) at CT = 1 and Mucilaginibacter (Bacteroidetes) at CT of 0.99-1 were identified].

ALIGNMENT OF SEQUENCES WITHIN OLIGOTYPES
We compared the sequence identities within OTUs clustered at ≥97% similarity (Edgar, 2010) in QIIME with those in oligotypes that are generated by manually selecting components of high entropy values. The results show that the sequence identities within an oligotype ranged between 99.5 and 100%. In order to reduce variation among sequence identities within an oligotype, all peaks with entropy values greater than 0.6 were resolved. The oligotyping process left very few unresolved peaks of relatively low entropy (<0.2) values within oligotypes that often had <100 sequences. These low entropy peaks were considered as the background noise (Table 2). However, even when the oligotyping analysis appeared not fully resolved, the range of % identities www.frontiersin.org was still within 99.76-100%. Table 2 shows two such examples: in one case there are two peaks with entropy values of <0.25, and in another case there is only one large peak with an entropy value at 0.65. In such cases, the small peaks were deliberately disregarded because their further decomposition did not result in additional oligotypes. The sequence identities within OTUs clustered using the same dataset ranged from 97 to 100% (results not shown).

TAXONOMIC ASSIGNMENTS USING DIFFERENT TOOLS
Major motivation for examining the use of oligotyping as a tool was to reveal concealed microbial population diversity that could not be seen using the OTU clustering approach (Turlapati et al., 2013) with the additional focus on the effects of N treatment on the distribution of these taxa. Comparison of taxonomic assignments between OTUs described earlier using QIIME 1.4.2 and the data presented in this report show more genera were identified using the oligotyping analysis ( Table 3). Although the same data set was used for both analyses, genus level information for phyla such as Acidobacteria, Bacteroidetes, and Chloroflexi was generated in the present study. It should be pointed out that the use of an updated version (2.7) of RDP also revealed four previously unidentified phyla (AD3, Cyanobacteria, TM6 and WPS-2; Table S2). Whereas OTU clustering resulted in 2% of the OTUs containing ∼50% of the total sequences (Turlapati et al., 2013), for oligotyping, 2% of the oligotypes contained ∼38% of the total number of sequences (results not shown). Another major difference was that 80% of the OTUs contained 10% of sequences in comparison to 20% sequences in 80% of oligotypes. A comparison of two databases (Greengenes used in QIIME vs. RDP online database) for classifying OTUs as well as oligotypes representative sequences resulted in a significantly lower number of taxa identified from both OTU and oligotype datasets with Greengenes as compared to RDP (Table 4). More genera were discernible when CT ≥0.5 was used (vs. ≥0.8) with oligotype representative sequences, (Table S4). One such example is the genus Terriglobus (Acidobacteria-Gp1), which was identified only at CT ≥0.5 ( Figure S2).

BACTERIAL DIVERSITY ANALYSES
Details on the various steps of oligotyping analysis and the number of oligotypes identified for each phylum are given in Table S2. In general, no direct correlation was observed between the numbers of sequences and the oligotypes. In all, sequences affiliated with 73 known genera were identified from these soils as compared to 27 genera observed in our previous study (Table 3). Oligotyping revealed that sequences corresponding to some genera were present in control but absent in N-treated soils. In other instances, those present in N-amended plots were not found in control plots.

BACTERIAL COMMUNITIES AND TREATMENT RELATIONSHIPS
Non-metric multidimensional scaling (NMS) followed by permutational MANOVA (Permanova) with ordination scores, i.e., the Bray-Curtis distance, obtained from normalized total oligotype data revealed significant differences among bacterial communities based on treatment and soil horizon (Figure 1; Table S5). In general, all five replicate soil samples from within the same treatment plot clustered together and displayed stronger similarities among their oligotypes as compared to those from other treatment plots. For the two largest phyla, Acidobacteria and Proteobacteria, NMS and Permanova analyses were conducted for individual subgroups and classes, respectively ( Figure S3; Table S5). With the exception of Bacteroidetes and subgroup Gp10 in Acidobacteria, the bacterial communities of the two horizons were significantly different. Additionally, within each soil horizon significant treatment effects on the structure of the bacterial community were observed for all phyla except for Chloroflexi, Firmicutes, TM7, Bacteroidetes, and subgroups Gp6 and Gp10 of Acidobacteria (Figure 1 and Table S5; Figure S3).
The highest variation in partitioning was observed in the phylum Acidobacteria, where horizon explained 49.0% (P ≤ 0.0002) of the variation among samples, and treatment accounted for about 21% (P ≤ 0.0004) of the variation (Figure 1A; Table S5). Most of this variation was due to subgroups Gp1, Gp2, and Gp3 ( Figure  S3). In the second most diverse phylum, Proteobacteria, horizon explained 36% (P ≤ 0.0002) of variation among the samples and the treatment accounted for 16.5% (P ≤ 0.0006) of the variation;

Total genera 27 73
Genera were identified at CT ≥0.8 using the online RDP classifier. *The genera Byssovorax and Actinocorallia were identified by OTU clustering. a major part of this variation was seen in α-and γ -Proteobacteria ( Figure S3). Often there was an overlap between the LN-and HNamended soil communities. Treatment effects were generally more pronounced in the Org soil horizon.

SOIL CHARACTERISTICS AND BACTERIAL COMMUNITIES
Mantel test results on soil chemistry and the bacterial community revealed strong correlations for the Org soil. Pooled oligotyping data from all phyla found in the Org soil horizon showed a stronger positive correlation between the entire bacterial community and soil pH, Ca, P, K, Zn, Mg, NH 4 , NO 3 , and total C ( Table 5) as compared to Min soil. When data for each phylum were analyzed separately, significant correlations were observed for Acidobacteria, Proteobacteria, Actinobacteria, Verrucomicrobia, WPS-2, and AD3. Exceptions included Proteobacteria which showed no correlation with Ca and P, nor did Chlamydiae with NH 4 and total C. The remaining phyla had correlations with fewer elements, specifics of which varied ( Table 5).
Oligotyping data pooled for all analyzed phyla from the Min horizon community showed strong positive correlations only with soil pH, acidity, NH 4 , and NO 3 . With few exceptions, this was www.frontiersin.org  true for analyses of each individual phylum. There was a positive correlation of Al with Acidobacteria and WPS-2, and Ca with Proteobacteria (Table 5).

CHANGES IN COMMUNITY STRUCTURE OF GENERA ASSOCIATED WITH SOIL HORIZON AND N-AMENDMENTS
Across all three treatments, oligotypes corresponding to five genera (Ammoniphilus, Clostridium X1, Solibacillus, Sporosarcina, and Viridibacillus) were found in Min soils but were absent in Org soils, and one genus (Conexibacter) was identified in Org soils but not in Min soils (Table 6; Figures 2A,C). Overall, oligotypes corresponding to five genera (Aquabacterium, Nitrosospira, Yersinia, Legionella, and Niabella) appeared exclusively in N-treated soils (in both horizons combined), while those representing eight genera (Comamonas, Microbacterium, Mycetocola, Brochothrix, Flavobacterium, Pedobacter, Sphingobacterium, and Terrimonas) were present in the control but absent from the N-treated soils ( Table 6; Figures 2B,C). A comparison of the presence of oligotypes specific to each genus revealed that some oligotypes were unique to Org soils while others were exclusively present in Min soils (Table S6; Figure 2). Additionally some oligotypes were present only in the N-amended soils and not in the control plots. Although Figures and Tables show all identified genera, only those that exhibited significant differences with treatment or soil horizon are discussed in this section.

TAXONOMY BY PHYLUM
Acidobacteria, which was the most abundant phylum in the HF soils, constituted ∼50% of total sequences and 22% of total oligotypes (Table S2). Subgroups Gp1, Gp2, and Gp3 accounted for most of the Acidobacteria sequences and oligotypes. Soil samples from control plots had three times more sequences affiliated with Gp1 as compared to Gp3. However, based on the total number of oligotypes, the Gp3 subgroup was more diverse than Gp1 in both soil horizons (Figures S4A,B). Gp13 was represented by a large number of oligotypes each containing small sets of sequences.
Four new genera were identified in the phylum Acidobacteria since our last report with the same soil samples ( Table 3). All of the genera identified in the Org horizon were represented by a significantly larger number of sequences as compared to those identified in the Min soil horizon for all three treatments ( Figure  S5A). However, the number of oligotypes did not vary greatly between soil horizons. Sequences of the genus Edaphobacter were significantly higher in HN vs. the control while Acidobacterium, Granulicella, and Bryobacter showed a reverse trend in the Org horizon ( Figure S5A). Oligotypes for Edaphobacter were more abundant in N treatment plots relative to control in Min soil of both ( Figure 2C). Proteobacteria was the second most abundant phylum and was highly diverse in these soils. This phylum comprised ∼20% of the total sequences and total oligotypes (Table S2). Classes α-Proteobacteria and γ -Proteobacteria together accounted for >70% of this phylum's sequences and oligotypes in each horizon ( Figures S6A,B). Regardless of the number of sequences, oligotyping data revealed that among all other classes and phyla, α-Proteobacteria were the most diverse in both soil horizons. In general, LN treatment was positively correlated with sequences and oligotype numbers. Altogether, 30 genera were found in Proteobacteria ( Figures S5B-D, Table 3): 11 each in αand γ -Proteobacteria ( Figures S5B-D), and eight in β-Proteobacteria ( Figure S5C). Fourteen out of these 30 genera were present in relatively high abundance (sequences ≥50). Shifts in community structure were observed among most genera in response to N treatments ( Figures  S6A,B). An increase in the number of sequences was observed for several genera in soil treated with LN ( Figures S5B,D). and (L) AD3. Each soil type is represented by 5 replicates and a centroid, which is indicated by a single symbol and treatment-soil horizon name. The number next to the phylum name represents the total number of oligotypes identified within each phylum. H = % of variation partitioned by horizon and T = % of variation partitioned by treatment.

Frontiers in Microbiology | Systems Microbiology
Actinobacteria constituted only 7% of total sequences and 7.6% of oligotypes (Table S2). Eleven genera were identified from this phylum relative to the prior findings of seven (by OTU clustering) for the same data set ( Table 3). The absence of certain genera in N-amended soils was observed in a few cases ( Figure S5E).
Verrucomicrobia was the third most abundant phylum with ∼10% of sequences and ∼10% of oligotypes in these soils (Table S2). Only two genera ( Table 3) from this phylum were identified. Genus Opitutus was present in tenfold higher numbers than Alterococcus ( Figure S5F). The number of sequences for Opitutus was lower in N-treated soils than in Control. For both Actinobacteria and Verrucomicrobia, horizon and treatment explained 44 and 10-13% of the total variation, respectively (Figures 1C,D).
Chlamydiae constituted 2.5% of the total sequences and ∼10% of oligotypes (Table S2). Three genera were identified in this phylum (Table 3); the relative abundance of oligotypes corresponding to the genus Parachlamydia were highest in N treatment plots relative to control in Org soil (Figure 2B; Figure S5G).
Chloroflexi constituted 2.7% of the total sequences and 6.5 % of oligotypes (Table S2), with only two genera (Ktedonobacter and Thermosporothrix - Figure S5H; Table 3), which were absent in the control treatment in the Org horizon but present in the Min soils (Figure 2A).
Although Firmicutes constituted only about 0.3% of total sequences, with 3.0% oligotypes, 12 genera were identified in this phylum (Table S2; Figure S5J). Most of these were present in very low numbers and were more prevalent in Min soils (e.g., Bacillus and Paenibacillus); all five of the genera seen in the Min soil horizon were absent in the Org soil horizon (Table 6; Figure 2); there was little effect of N treatments ( Figure 1I; Table 5).
Together the phyla TM7, Gemmatimonadetes, and Nitrospira comprised ∼1% of the total sequences and 2.4% of oligotypes www.frontiersin.org All 15 samples from each soil horizon were pooled for these analyses. The iterations were set to 5000. Asterisks Indicates significant correlations (P ≤ 0.05). −Indicates non-significance. NA denotes that analyses could not be performed due to insufficient data. (Table S2). From these three phyla, only two genera were identified; Gemmatimonas in Gemmatimonadetes and Nitrospira in Nitrospira ( Figures S5L,M). The number of sequences representing the genus Nitrospira was greater in Min vs. the Org soil horizon and in LN treatment vs. the control and HN treatment. While both treatment and horizon explained significant variation found in Gemmatimonadetes ( Figure 1J; Table S5) only horizon-specific effects were seen for TM7 ( Figure S3L). NMS generated a four dimensional solution for TM7 (Table S5).
Because of a small dataset, no further analyses were conducted on Nitrospira.
The phyla TM6, Cyanobacteria, Elusimicrobia, WPS-2, and AD3 were not identified in our previous report with the same data. TM6 was represented by only 36 sequences, most of which were filtered out in the initial oligotyping run and, therefore, were not analyzed further (Table S2). The other five phyla together constituted a relatively small portion of the total sequences (∼5.2%, with numbers ranging from 142 to 280) and 17.2% of the total oligotypes (Table S2). Among these phyla, WPS-2 was the most affected by treatment as well as by horizon (Figure 1K). Only the genus Elusimicrobium was identified in Elusimicrobia ( Figure  S5I). Bacteroidetes contained only 259 sequences but 36 oligotypes, and six distinct genera were identified in this phylum; four of which were absent from N-amended soils (Table 6; Figure  S5K). NMS analyses yielded only one dimension solution for this phylum, and thus no figure was generated (Table S5).

DISCUSSION
The primary objective for re-analyses of the pyrosequencing data from the previous study (Turlapati et al., 2013) was to determine additional diversity by applying more efficient and reliable bioinformatics tools. The results enabled us to identify a total of 4534 oligotypes belonging to 15 different bacterial phyla with 73 genera. The same dataset had previously resulted in 6936 OTUs belonging to 11 different bacterial phyla with only 27 identifiable genera. There are two main explanations for this apparent discrepancy: the first being the updating of tools and databases used for classification to newer versions since our previous report; and the second (somewhat obscure) is the difference between the classifiers and databases currently being used for the two clustering methods. Whereas the RDP classifier version 2.2 in QIIME pipeline uses the Greengenes database for OTU-rep classification, the RDP online classifier version 2.7 used for oligotyping has its own built-in RDP database. Although oligotyping and OTU clustering identify similar numbers of taxa (in a comparative study using four phyla constituting ∼28% of total sequences, Table 4), the former has greater resolving ability for classifying nearly identical, closely related organisms provided a good reference genomic library is available to compare against. Using entropy analysis, oligotyping simultaneously clusters multiple sequences based on similarity/identity of each nucleotide along the entire length of the read for all sequences within a given group. However, OTU groups sequences at 97% similarity with a representative sequence. The 3% difference in nucleotides may occur anywhere along the entire length of the read, and that location can vary from sequence to sequence within the group. That is to say, 2 sequences within an OTU can differ from the representative sequence at different nucleotide positions. Additionally, once a sequence has been selected and grouped within one OTU it cannot be assigned to another even if it has greater similarity with the representative sequence of the second OTU. It is this difference in the way sequences are clustered by these methods that makes oligotyping more powerful in grouping closely related organisms. While a relationship between oligotypes and most soil chemistry parameters was observed only a few such relationships were observed for OTU data by the earlier report (Turlapati et al., 2013).
A major limitation of any study involving a soil microbiome (including the present one) is the lack of reference genomes (even for dominant taxa; Howe et al., 2014). Metagenomic outputs of most current high-throughput sequencing technologies (e.g., Illumina) often result in a mixture of multiple genomes most of which do not cover a complete genome of the organisms of interest since complete reference genomes of known organisms are lacking (Simon and Daniel, 2011;Teeling and Glöckner, 2012). Therefore, many studies still rely on universally occurring DNA sequences (either partial or complete), e.g., the 16S rRNA genes. Single cell genomics offers a powerful technique for characterizing the genome of a single organism (Zaremba-Niedzwiedzka et al., 2013;Macaulay and Voet, 2014); however, this is still an emerging technology and is difficult to apply to the microbiome of complex systems like forest soils. In the absence of genome-specific sequence libraries from forest soils, it is difficult to assign the terminal taxonomic identity (e.g., at species level) even to 16S rRNA genes or any other gene sequences. Thus, amplicon-sequencing (although known to be biased against rare organisms) remains a realistic approach to estimate the diversity of large microbial populations in complex environments.
Oligotyping enables the detection and classification of distinct subpopulations within a genus, or even within a single species www.frontiersin.org

Frontiers in Microbiology | Systems Microbiology
February 2015 | Volume 6 | Article 49 | 12 as was shown for Gardnerella vaginalis in humans (Eren et al., 2011). With forest soil microbiomes, although taxonomy at the species level could not be assigned because of the lack of reference genome data, oligotyping did enable us to detect subpopulations within a genus. Furthermore, the distribution of many of these subpopulations often varied between the soil horizons and among long-term treatments with N fertilizer (Table S6).
The results presented here demonstrate the applicability of oligotyping to complex microbiomes of forest soils. However, some adjustments may be necessary to the stringency of parameters that Eren et al. (2013) had suggested. For example, in order to assess the diversity of closely related bacterial populations in an ecosystem by oligotyping of 16S rRNA gene sequences, Eren et al. (2013) emphasized the importance of at least four critical parameters (namely 's,' 'a,' 'A' and 'M') that minimize the impact of sequencing errors in determining the outcome of results. They further summarized that s and M are critical components used to reduce the noise in such analyses. Soil samples have greater microsite variability, bacterial diversity and the occurrence of rare organisms as compared to other microbiomes (e.g., human body and marine waters). All of the data in the present study were analyzed using M values based on two criteria: namely, retaining maximum sequences and the diversity in terms of the number of oligotypes with an s = 2 (due to high microsite variability among five replicates). In the present study, high diversity was evident from a large number of entropy peaks with high values for components for most phylum level datasets ( Figure S1). Therefore, oligotyping of these soil samples required several rounds of supervised (user-defined component selection) analyses before all of the entropy peaks could be decomposed. Lowering the M values from those suggested by Eren et al. (2013), especially for relatively larger datasets led to the identification of more genera at CT ≥0.8 (Table 1). This suggests that many organisms are probably present in low abundance (constituting the rare microbiome) in HF soil. Even with much smaller datasets, where M values of two or three were used, genera were identified at CT ≥0.8 (e.g., genus Aquabacterium of class β-Proteobacteria - Table 1). Huse et al. (2010) suggested that if a sequence occurs in two separate environmental samples (i.e., s = 2), then the chance of it being noise or a technical error is almost zero and thus should be considered as a sequence affiliated with a rare organism. Using this same dataset, Turlapati et al. (2013) earlier reported 4093 singleton OTUs among a total of 11,029 (37%) with s = 1 (default). Therefore, the present classification with oligotyping with s = 2 should be more reliable as compared to OTU clustering in eliminating noise.
The primers used in the present study specifically target the V6-V8 region of 16S rRNA genes and were chosen due to the high sequence variability associated with this region (Brons and van Elsas, 2008). The poor ability of RDP to assign taxonomy to V6 reads at CT ≥0.8 as compared to CT ≥0.5 has been reported by Claesson et al. (2009). Similarly, in this present study, a greater number of genera were identified at CT ≥0.5 vs. CT ≥0.8 (Table  S5). For example, genus Terriglobus (Acidobacteria, Gp1) could only be identified at CT ≥0.5; sequences for this genus were found in all 30 samples and were significantly higher in HN-Org soils as compared to control. These observations suggest that the standard CT value of ≥0.8 at the genus level may need to be adjusted when working with the V6-V8 hypervariable regions of the 16S rRNA gene especially in ecosystems for which reference genome libraries are lacking.
Available analytical tools and public databases, such as RDP, are constantly being updated to meet increasing demand for taxonomic classification arising from high throughput outputs created by next generation sequencing platforms (Cole et al., 2009(Cole et al., , 2013. Mclean et al. (2013) reported 31 candidate phyla including recently identified TM6 in the bacterial population of a hospital sink. In the present study, A total of 16 phyla were identified as compared to 11 in our previous report which used the same dataset in the QIIME pipeline (Turlapati et al., 2013). The RDP classifier assigned phyla names such as AD3, Elusimicrobia, Cyanobacteria, TM6, WPS-2 to the sequences that were termed unclassified in our previous study. No genera were identified within these phyla with the exception of Elusimicrobium. Most importantly, the overall unclassified sequences previously constituting 15-20% of the total were reduced to 0.5% in the current analysis.
Although Acidobacteria constituted >50% of total sequences, only four genera were identified in this phylum. The availability of reference genomes would be useful in further classifying this phylum; however, to date only eight genera have been taxonomically described in this phylum (Männistö et al., 2011 and references therein). Naether et al. (2012) reported that within Acidobacteria Gp1, Gp2, and Gp3 organisms favor nutrient-limited soils as compared to other subgroups. The dominance of these three subgroups of Acidobacteria in both soil horizons at HF suggests that these soils are perhaps nutrient limited. VanInsberghe et al. (2013) reported that in comparison with Proteobacteria, Acidobacteria are more prevalent in soils with low resource availability; our results are in agreement with this report and further reinforce the conclusion that HF soils are nutrient poor. Differences observed between bacterial communities in the Org and Min soil are clearly attributable to the differences in the soil chemistry of the two horizons (Table S3). Although HF soils may be nutrient limited, bacteria in the Org horizon are perhaps adapted to relatively nutrient-rich environment compared to those in the Min horizon. Our results demonstrate that with few exceptions, the Org soil communities were more impacted by N-treatment as compared to the Min soil communities. Compared to Min soil horizon, bacteria in the Org soils demonstrated stronger relationships with most of the soil chemistry parameters ( Table 5). Fierer et al. (2012) also reported greater phylogenetic shifts in microbial communities that prefer a nutrient rich environment following N fertilization. Naether et al. (2012) also reported correlations between edaphic factors such as pH, C, N, C/N ratio, and P with corresponding OTUs (16S clones) and terminal-RFLP (T-RFLP) for most of the subgroups of Acidobacteria found in soil from 30 forested and 27 grassland sites. They found either positive or negative correlations of different OTUs or T-RFLPs within respective subgroups of Acidobacteria over a wide pH and nutrient range. Another study involving 87 soil types with pH values ranging from 3.5 to 8.5 reported an overall inverse relationship between soil pH and the relative abundance of Acidobacteria ). However, a closer look at data shows that within a narrow range of pH from 3.5 to 4.5, this inverse relationship is not held. The pH of www.frontiersin.org HF soil ranged from 3.8 to 4.4 for Org and 4.3 to 4.8 for Min. At HF, a positive correlation between the subgroups of Acidobacteria and pH in this narrow range for each soil horizon indicates that for optimal growth, this group prefers the higher end of this narrow range at HF (Table S3). Our results are in agreement with those of Sait et al. (2006) who found that subgroup Gp1 ideally requires moderately acidic conditions (pH 4.0-5.5).
Despite the existence of significant effects on aboveground foliar Org N metabolites (polyamines and amino acids; Table  S3), tree physiology and productivity (Minocha et al., 2000;Bauer et al., 2004;Magill et al., 2004;Frey et al., 2014), and changes in soil microbial diversity at the HF, the lack of any lasting effects of N-amendment on soil NH 4 and NO 3 concentrations is interesting and apparently contradictory. We speculate that this is due to the combined effects of the fast uptake of the fertilizer (applied only during the growing season) by the macroflora, its rapid conversion into other inorganic N and Org N metabolites (e.g., polyamines and amino acids), and leaching of a significant amount of applied N.
Polyamines are present in all living organisms. They are required for growth and are also involved in stress responses (Minocha et al., 2014). Polyamines and specific-amino acids (e.g., glutamine and arginine) are known to be major N storage metabolites in plants, especially under excess N conditions. Concentrations of these compounds were found to be high in the foliage of trees growing in the N-treated plots at the HF (Minocha et al., 2000;Bauer et al., 2004). Changes in concentrations of polyamines and amino acids were observed in the same soils used for the present study (Frey et al., 2014). These findings suggest that the effects of N-addition on shifts in bacterial community structure may have resulted partially through effects of N-amendments on the growth of the aboveground plant community and vice versa. Also, it can be hypothesized that changes observed in the microbiome are due to the preference of certain microbial taxa for excess N that was present immediately following N application. Then, over the longer term of repeated N applications, they were stabilized and became a major component of the microbiome during the phase when soil inorganic N reverted back to original levels. This argument is supported by the observation that some of the functionally important genera (e.g., Nitrosospira of phylum Proteobacteria, which include wellknown NH 3 oxidizers/nitrifiers) appeared mostly in N-treated soils. Using the amplification of a functional N-transformation gene amoA, He et al. (2007) observed an increased abundance of Nitrosospira sequences in response to N-treatments at a Chinese Agricultural Experimental Station. In our study the sequences and oligotypes corresponding to Nitrospira were higher in LNtreated Min soil. However, using 16S markers, Wertz et al. (2012) observed no change in the abundance of sequences for Nitrospira (another potential nitrifier group within the phylum Nitrospira) in response to long term N-fertilization at five forested sites in British Columbia, Canada.

CONCLUSION
A total of 46 previously unidentified genera were recognized by oligotyping vs. OTU clustering analysis of PCR-amplified partial 16S rDNA sequences from HF soil DNA. Because of the lack of a reference genome database for forest soils, both clustering approaches yield limited information at the genus and species level; however, oligotyping enables reliable classification of closely related organisms because of the high stringency of this tool. This analytical approach further revealed strong correlations between soil chemistry and oligotypes; no such correlations were discernible with the OTU clustering approach. Based on the fact that we could identify several genera at CT ≥0.98 using a relatively lower M value, we suggest that lowering M values may be appropriate for the complex microbiomes such as forest soils that are comprised of an enormous diversity of bacteria that are often present in low abundance. As suggested by Mantel test results, bacterial communities in the Org soil at HF have high preference for a nutrient rich environment and the communities found in the Min soil are better adapted to nutrient poor conditions. Overall, effects of N-treatment on the microbiome were more evident in the Org soil than the Min soil horizon, perhaps due to the fact that N utilization requires an abundance of C, which three times higher in the Org as compared to Min soil.