An Analysis of Thaumarchaeota Populations from the Northern Gulf of Mexico

We sampled Thaumarchaeota populations in the northern Gulf of Mexico, including shelf waters under the Mississippi River outflow plume that are subject to recurrent hypoxia. Data from this study allowed us to: (1) test the hypothesis that Thaumarchaeota would be abundant in this region; (2) assess phylogenetic composition of these populations for comparison with other regions; (3) compare the efficacy of quantitative PCR (qPCR) based on primers for 16S rRNA genes (rrs) with primers for genes in the ammonia oxidation (amoA) and carbon fixation (accA, hcd) pathways; (4) compare distributions obtained by qPCR with the relative abundance of Thaumarchaeota rrs in pyrosequenced libraries; (5) compare Thaumarchaeota distributions with environmental variables to help us elucidate the factors responsible for the distributions; (6) compare the distribution of Thaumarchaeota with Nitrite-Oxidizing Bacteria (NOB) to gain insight into the coupling between ammonia and nitrite oxidation. We found up to 108 copies L−1 of Thaumarchaeota rrs in our samples (up to 40% of prokaryotes) by qPCR, with maximum abundance in slope waters at 200–800 m. Thaumarchaeota rrs were also abundant in pyrosequenced libraries and their relative abundance correlated well with values determined by qPCR (r2 = 0.82). Thaumarchaeota populations were strongly stratified by depth. Canonical correspondence analysis using a suite of environmental variables explained 92% of the variance in qPCR-estimated gene abundances. Thaumarchaeota rrs abundance was correlated with salinity and depth, while accA abundance correlated with fluorescence and pH. Correlations of Archaeal amoA abundance with environmental variables were primer-dependent, suggesting differential responses of sub-populations to environmental variables. Bacterial amoA was at the limit of qPCR detection in most samples. NOB and Euryarchaeota rrs were found in the pyrosequenced libraries; NOB distribution was correlated with that of Thaumarchaeota (r2 = 0.49).


INTRODUCTION
The Mississippi River outflow forms a surface plume up to 10 m thick upon entering the northern Gulf of Mexico. Stratification and nutrient (especially nitrogen) enrichment of river water (Turner et al., 2006) lead to elevated primary production in the plume and thus to increased organic matter deposition 10 to 100 km away from river discharge sites (Rabalais et al., 2002;Green et al., 2008). Decomposition of this organic matter is thought to contribute to the formation of a recurrent hypoxic zone in the northern Gulf of Mexico that profoundly affects the ecology, fisheries biology, and geochemistry of the region (Rabalais et al., 2002;Dagg et al., 2007;Cai et al., 2011). Intermittent hypoxia ([O 2 ] ≤ 2 mL/L or ∼90 µM; Diaz and Rosenberg, 2008) begins to develop in February and typically is most pronounced from mid-May to mid-September (Rabalais et al., 2010).
Processes such as coupled nitrification/denitrification that remove excess fixed nitrogen affect primary production and thus may be important determinants of the extent and duration of hypoxia. Ammonia oxidation is the first step in the biogeochemical pathway leading to denitrification. Members of the βand γsubdivisions of the Proteobacteria (Ammonia-Oxidizing Bacteria, AOB) and Marine Group 1 Archaea (Ammonia-Oxidizing Archaea, AOA) can grow chemoautotrophically by oxidizing ammonia to nitrite (Ward, 2011). The nitrite produced can be oxidized further to nitrate by Nitrite-Oxidizing Bacteria (NOB) and then denitrified (Jetten, 2001;Francis et al., 2007;Ward et al., 2009).
One of the goals of the present study was to quantify the distribution of AOA in the northern Gulf of Mexico in the area influenced by the Mississippi River plume and recurrent hypoxia. We hypothesized that ammonia oxidizers would be abundant there because of the high riverine nitrogen loading to the region and the importance of respiration (Cai et al., 2011), and thus presumably nitrogen regeneration, in the region experiencing hypoxia. We also hypothesized that AOA would dominate ammonia oxidizer populations at pelagic stations, although AOB were found to be more abundant than AOA in sediments from Weeks Bay, Alabama (Caffrey et al., 2007). To test these hypotheses, we determined AOA and AOB distributions by quantitative PCR (qPCR) measurements of the abundance of rrs and amoA genes. We also pyrosequenced rrs genes from our samples as an independent check on distributions based on qPCR data. A second goal was to analyze variation in sequences of rrs and compare this to genes from two metabolic pathways that are important to AOA, ammonia oxidation and carbon fixation, to provide a more highly resolved description of the composition of Thaumarchaeota populations than can be obtained from analyses of single genes. AOA can grow autotrophically (Könneke et al., 2005) using the 3-hydroxypropionate/4-hydroxybutyrate pathway (Berg et al., 2007). The potential for AOA autotrophy can be detected in the environment using primers targeting the genes in this pathway, notably acetyl-CoA/propionyl-CoA carboxylase (accA; Yakimov et al., 2009) and 4-hydroxybutyryl-CoA dehydratase (hcd; Offre et al., 2011). We tested both of these primer sets with our samples. We compared the phylogenetic diversity present in their amplicons with diversity represented in amplicons from more widely used primer sets for amoA and rrs. We then used rrs sequences from the pyrosequencing effort to extend phylogenetic inferences based on analyses from samples taken at one station more broadly across the study area. A third goal was to investigate the relationship between Thaumarchaeota distributions and environmental variables to provide insight into the factors controlling their distribution. Pyrosequencing data were also used to compare the distribution of NOB with AOA to gain insights into the coupling between these two steps of nitrification.

SAMPLE COLLECTION AND DNA EXTRACTION
Samples were collected during the R/V Cape Hatteras GulfCarbon 5 cruise in the northern Gulf of Mexico (30˚07 N, 088˚02 W to 27˚39 N, 093˚39 W; Figure 1) from March 10-21, 2010. Samples were collected using Niskin bottles and a General Oceanics rosette sampling system equipped with an SBE25 CTD and sensors for [O 2 ], beam attenuation (turbidity), and relative fluorescence (calibrated to chlorophyll a equivalents). The [O 2 ] sensor was cross-calibrated against Winkler titrations of [O 2 ] in samples collected at fixed depths. pH data were collected using a glass electrode by W.-J. Huang of Dr. W.-J. Cai's group. Euphotic depth (defined as 1% PAR, 400-700 nm) was calculated for each station from Aqua MODIS satellite data using an average of the Lee and Morel models 1 by H. Reader and C. Fichot. Nutrient data were collected at some of the station/depths we sampled by Dr. S. Lohrenz's group. Since nutrient sample collections were biased in favor of near-surface samples on the continental shelf, these data were used only in BEST analysis (see Appendix). Approximately 1 L of water from each Niskin bottle was pressure filtered (at ∼60 kPa) through 0.22 µm Durapore filters (Millipore); filters were frozen in 2 mL of lysis buffer (0.75 M sucrose, 40 mM EDTA, 50 mM Tris; pH 8.3). DNA was extracted by enzymatic hydrolysis with lysozyme (50 mg mL −1 ), proteinase K (20 mg mL −1 ), and sodium dodecyl sulfate (100 µL of a 10% solution), and then purified by phenol-chloroform extraction as described previously (Bano and Hollibaugh, 2000).

QUANTITATIVE PCR
Quantitative PCR was performed using an iCycler iQ™Real-Time qPCR detection system (Bio-Rad) and the primers listed in Table A1 in Appendix. qPCR reactions were run in triplicate with standards made from environmental amplicons as described in the "Methods" in Appendix. TaqMan® (Applied Biosystems) chemistry was used to detect amplification of Bacteria and Thaumarchaeota 16S rRNA genes (rrs) following Kalanetra et al. (2009); all other amplifications were detected using SYBR® Green Supermix (Bio-Rad). We compared two primer sets for detecting Archaeal amoA: Arch-amoA-for and Arch-amoA-rev ("Wuchter primers"; Wuchter et al., 2006) and ArchamoAF and ArchamoAR ("Francis primers"; Francis et al., 2005). Reactions using the Wuchter primers were set up as described in Kalanetra et al. (2009), while PCR conditions for the Francis primers followed Santoro et al. (2010), except that SYBR® Green Supermix (Bio-Rad) was used with no additional MgCl 2 . Amplification of pSL12 rrs followed Mincer et al. (2007), with the number of amplification cycles reduced to 40 to prevent quenching of the fluorescence signal. Archaeal accA genes were amplified following Yakimov et al. (2009) with shorter cycle lengths . Specificity of SYBR® Green reactions was confirmed by melting curve analysis; accA amplicons were also checked by sequencing clones created with qPCR primers Crena_529F and Crena_981R (Yakimov et al., 2009). We also tested published primers for hcd genes (Offre et al., 2011), but found that non-specific amplification rendered them unsuitable for qPCR with our samples (see Appendix). Inhibition of qPCR reactions was tested using dilutions of DNA 10-1,000× with the Bacterial rrs qPCR assay; samples that showed higher copy number than expected from typical dilution were determined to have PCR inhibitors present and run at the dilution which gave the highest copy number for all other gene assays. Calculations of gene abundance and ratios are discussed in the "Methods" in Appendix, and qPCR efficiencies for reactions are reported in Table A1 in Appendix.

PHYLOGENETIC ANALYSIS
We sequenced cloned rrs, amoA, and accA amplicons to obtain phylogenetic descriptions of the Thaumarchaeota populations in the study area and to verify specificity of qPCR reactions. Libraries were generated from samples collected at Station D5, located on the southern edge of the area influenced by the Mississippi River plume and over the continental slope (Figure 1) using methods described previously (Kalanetra et al., 2009) and summarized below. This station was chosen for its depth and as representative of slope stations influenced by hypoxia. We compared samples from different depths at this station as others (e.g., Lam et al., 2007;Beman et al., 2008;Kalanetra et al., 2009;Church et al., 2010;Santoro et al., 2010) have shown segregation of Thaumarchaeota populations by depth. rrs and amoA were amplified from DNA collected at 100 and 200 m, while accA amplicons were generated from samples collected at 2, 50, 100, 200, and 450 m to test the accA primer set across a wider depth range. PCR amplifications of Archaeal rrs, amoA, and accA used the primers listed in Table A1 in Appendix. Three separate amplifications were pooled to minimize potential PCR bias and electrophoresed on a 1% agarose gel. The band of the expected DNA product size was excised, extracted and purified using the QIAquick® Gel Extraction Kit (QIAGEN), and incorporated into a TOPO 4 vector (Invitrogen) prior to cloning using chemically competent TOP10 E. coli cells with the TOPO TA cloning kit (Invitrogen) following the manufacturer's instructions. Clones from each library were selected randomly and sequenced (Genewiz, Inc.) using the plasmid primer M13F(−21). Euryarchaeota rrs sequences were identified by BLAST (Zhang et al., 2000) and not analyzed further.

PYROSEQUENCING ANALYSES
We also analyzed the distribution of ribotypes in 41 of our 52 samples by massively parallel sequencing (pyrosequencing) using a Roche 454/FLX instrument running Titanium chemistry. rrs in DNA extracted from our samples were amplified by PCR using universal rrs primers 515F and 806R (Table A1 in Appendix), modified for bar-coded pyrosequencing. PCR protocols and primer sequences, including barcodes, adaptors, and linkers, followed Bates et al. (2011). Purified DNA from three reactions for each sample was pooled to produce a mixture in which amplicons from each sample were represented equally. The final mixture was sequenced using standard protocols by Engencore (University of South Carolina, Columbia, SC, USA). Sequence data have been deposited with MG-RAST 4 at accession numbers 4509220.3-4509263.3. Metadata are available via the project page: "Analysis of composition and structure of coastal to mesopelagic bacterioplankton communities in the nGoM." A total of 435,290 sequences were filtered and trimmed (minimum length 200 bp, minimum quality score 20; 221,410 sequences passed) and then sorted into OTUs using the PANGEA pipeline (Giongo et al., 2010). Phylogenetic affiliations of these sequences were determined by a megablast analysis using a reference set of more than 170,000 rrs sequences from described isolates obtained from the RDP II database (Giongo et al., 2010). Amplicon sequences were binned into OTUs at domain, phylum, class, order, family, genus, and species levels based on megablast results, and then grouped into phylogenetic clusters and sorted by station and depth (average number of sequences per sample: 5,400; range 764-9,176). The PANGEA pipeline assigns all Archaea sequences to one group that also includes divergent Bacteria sequences. In order to more accurately assess the proportion of Thaumarchaeota in our samples, we manually enumerated hits to Thaumarchaeota in the megablast output for each sample. We also counted hits to known AOB, NOB, and Euryarchaeota.
Thaumarchaeota rrs sequences obtained from pyrosequencing were included for phylogenetic analysis using mothur (v. 1.21.1; Schloss et al., 2009). Unique sequences were grouped together and aligned against the Silva Archaea reference database 5 . The resulting alignment, including rrs sequences from Station D5 clone libraries and outgroups, was trimmed to a set length and eight chimeric sequences were removed with Uchime (Edgar et al., 2011); additional potential chimeras and erroneous sequences were checked manually using BLAST and removed if necessary. The remaining 23,677 Thaumarchaeota sequences were clustered and representatives from each OTU obtained. A maximum likelihood tree was constructing using representative sequences grouped at 98% similarity (2,772 sequences total) with the RAxML program (Stamatakis et al., 2005) within ARB (Ludwig et al., 2004); 100 trees were generated using rapid bootstrap analysis, and the consensus tree was constructed from these iterations. Rarefaction analysis 5 http://www.mothur.org/wiki/Silva_reference_files was completed using mothur as described for clone library samples above. The Bacteria populations of these samples are analyzed in King et al. (2013).

STATISTICAL ANALYSES
Model II ordinary least squares pairwise regressions were calculated following Legendre and Legendre (1998) using software available at the R-Project web site 6 . Coefficients of determination and confidence limits of regression equations were calculated from 999 bootstrap permutations. PRIMER (v.6;Clarke and Gorley, 2006) was used to compare environmental and biological data from each station. We normalized environmental data in PRIMER to reduce the influence of variable unit scales before principal components analysis (PCA). The software package CANOCO (v. 4.5;ter Braak and Šmilauer, 2002) was used for canonical correspondence analysis (CCA; ter Braak, 1986) using PCA values and log-transformed qPCR gene abundances. Significance of CCA was determined using 499 Monte-Carlo permutations (reduced model) as recommended in the program documentation. The RAxML tree constructed from 454-generated Thaumarchaeota rrs sequences was used in Fast UniFrac (Hamady et al., 2009) to investigate phylogenetic patterns by sample location and depth. Weighted abundances of sequences within samples were used in both Principal Coordinates Analysis (PCoA) and sample clustering, as well as to calculate pairwise Unifrac distances. Counts were normalized to reduce the influence of larger sample sizes (greater number of sequences) at certain stations. The significance of sample clusters was tested using 100 jackknife permutations and resampling of the minimum (2), first quartile (100), or median (520) number of sequences across all samples; any sample containing less than the number of re-sampled sequences was eliminated from the analysis.

GENE ABUNDANCE AND DISTRIBUTION
The abundance of Bacterial rrs in these samples ranged from 10 5 to 10 10 copies L −1 ( Table 1; Table A2 in Appendix). Thaumarchaeota 6 http://cran.r-project.org/web/packages/lmodel2/index.html    Francis et al. (2005) amoA primer set. "Near-surface" is ≤100 m depth; "deep" is >100 m depth; "inshore," over the continental shelf (seafloor depth < 100 m); "offshore," shelf break and beyond (depth > 100 m). Gene abundances are given as copies L −1 of sample filtered as determined from triplicate qPCR amplifications of Archaeal and β-Proteobacterial amoA and Archaeal accA (left) and Thaumarchaeota, pSL12, and Bacterial rrs (center); note that scales for β-Proteobacterial amoA and pSL12 rrs are reduced by 10-100 to allow for visualization of variation with depth. Environmental data were taken from a CTD attached to the frame of the rosette sampler (right). Sampling depths are shown as X's on the depth axis; missing points indicate that the measurement was below the limit of detection (see Table A1 in Appendix for detection limits).

Frontiers in Microbiology | Aquatic Microbiology
rrs genes were present in the same samples at up to 10 8 copies L −1 ( Table A2 in Appendix) with population maxima occurring typically between 100 and 200 m depth and at lower [O 2 ] and temperature (Figure 2). The abundance of rrs genes attributable to the pSL12-like clade was much lower, near the limit of detection (see Table A1 in Appendix) in most samples with a maximum abundance of 10 5 copies L −1 ( Table A2 in Appendix). Similar trends with depth for pSL12 rrs were observed as Thaumarchaeota rrs, www.frontiersin.org though pSL12 rrs abundance was generally 100-to 10,000-fold lower (Figure 2), except in one sample (Station H1-7 m), where pSL12 rrs was 10% of Thaumarchaeota rrs. No Thaumarchaeota rrs were detected at the freshwater Mississippi River station (MR1-2 m) where pSL12 rrs was present at 10 5 copies L −1 ( Table A2 in Appendix). Thaumarchaeota accounted for a high proportion (up to 40% by qPCR, up to 54% of pyrosequenced rrs) of the total prokaryotic community in our samples. This percentage varied with depth (Figure 3), with deeper (>100 m) samples containing an average of 21% Thaumarchaeota (range 0.5-40%) while samples from near-surface water (≤100 m) contained only 1.8% Thaumarchaeota (range 0-9%). Differences were also observed with distance from shore, with shallower (<100 m) samples from inshore stations having fewer Thaumarchaeota than those from offshore stations (1.1 versus 2.8% of prokaryotes, respectively). Pyrosequencing also showed that Thaumarchaeota rrs genes were most abundant in samples from depths of 100-200 m, though they were present at low abundances in all samples with the exception of MR1-2 m ( Table A3 in Appendix), in agreement with qPCR analyses. Thaumarchaeota accounted for 0.1-54% of the prokaryotes in pyrosequencing libraries and their distributions based on qPCR estimates of gene abundance compared favorably with the contribution of Thaumarchaeota ribotypes to pyrosequenced rrs libraries from these samples (Figure 4; model II regression, n = 41, r 2 = 0.82, 95% CL of slope = 0.54-0.73).
Archaeal amoA was present at up to 10 8 copies L −1 (Table 1; Figure 2; Table A2 in Appendix). Bacterial amoA was at the limit of detection ( Table A1 in Appendix) in most samples, with a maximum of 10 6 copies L −1 . The ratio of AOA:AOB amoA was found on average to be 2100:1 (Wuchter primers) to 3300:1 (Francis primers). The ratio of Bacterial amoA:Bacterial rrs averaged 0.001 across all samples, with a maximum of 0.05 at Station D3-68 m ( Figure A5A in Appendix). Abundances of accA genes ranged from the limit of detection (10 4 copies L −1 ) to 10 7 copies L −1 (Table 1; Figure 2; Table A2 in Appendix). Archaeal amoA (quantified using Wuchter primers) showed similar distribution by depth as Thaumarchaeota rrs (Figure 2). However, accA abundances showed opposite trends with depth, leading to higher ratios of amoA:accA or rrs:accA in near-surface (≤100 m) water (Figure 2; Table 2).

FIGURE 3 | Abundance of Thaumarchaeota as a percentage of total bacterioplankton plotted against sample depth.
We used PCA (Figure A3 in Appendix) to identify samples from similar environments and group them into a few categories to simplify comparisons. The first two PCA axes explained 63.2% of the variation between samples ( Figure A3; Table A5 in Appendix), which supported placing stations into three groups: near-surface inshore, near-surface offshore, and deep offshore sets. CCA was included (Figure 8) to investigate relationships between gene abundances and environmental conditions (similar to BEST analysis, see Appendix). The primary CCA axis (CCA1) explained 47.9% of the gene abundance-environment relationship; adding the second axis (CCA2) increased the variance explained by 44% (91.7% total; Figure 8; Table A6 in Appendix). A global permutation test gave a statistical significance of p < 0.05 for station groupings based on both canonical axes considered together (F = 2.26, p = 0.014), while CCA1 considered alone did not explain the gene abundance-environment relationship (F = 8.43, p = 0.086). Thaumarchaeota rrs abundance was negatively correlated with most environmental variables, except for salinity and depth (Figure 8). Bacterial rrs abundance correlated positively with euphotic zone depth and had a strong negative  Frontiers in Microbiology | Aquatic Microbiology correlation with pH, with little influence from any variable primarily contributing to CCA2 (beam attenuation, oxygen; Figure 8). The distribution of Archaeal amoA genes as assessed with the Wuchter primers, in contrast, was not strongly influenced by variables contributing to CCA1 (fluorescence, pH, latitude, longitude; Figure 8) but showed a weak positive correlation with temperature and beam attenuation (turbidity). Archaeal amoA gene abundance assessed by the Francis primers showed the opposite trend, with strongest positive correlations to latitude (which covaries with distance offshore and depth in this region) and oxygen concentrations (Figure 8). Bacterial amoA gene abundance correlated with beam attenuation (turbidity) and temperature (positive correlation), as well as depth (negative correlation). accA gene abundance had strong positive correlations with relative fluorescence (chlorophyll a equivalents) and pH (Figure 8).

THAUMARCHAEOTA COMMUNITY COMPOSITION AT STATION D5
Phylogenetic analysis of 67 Sanger-sequenced Thaumarchaeota rrs sequences obtained from 100 and 200 m depth at Station D5 revealed 10 different OTUs ( Figure 5; Table A4 in Appendix; 98% similarity cutoff). All but one of the sequences retrieved from the 100 m sample clustered into a single OTU (the "Near-Surface Group," Figure 5), that also contained one sequence retrieved from the 200 m sample and the reference sequence from Nitrosopumilus sp. NM25 (AB546961; Matsutani et al., 2011). We did not retrieve any sequences related to the marine pSL12like clade. Sequences retrieved from the 200 m sample displayed greater richness and evenness ( Table A4 in Appendix; 9 OTUs) and included some OTUs that appear unique to the northern Gulf of Mexico. We retrieved 184 amoA sequences from Station D5. Phylogenetic analysis of the translated and aligned amino acid sequences revealed two OTUs (similarity cutoff of 97%) of AmoA ( Figure 6A): one containing primarily near-surface (100 m) sequences ("Group A" following Beman et al., 2008) and the other dominated by sequences from 200 m ("Group B"). amoA nucleotide sequences also grouped primarily by depth, but with greater richness and diversity ( Table A4 in Appendix) at a given depth than we observed for Thaumarchaeota rrs genes. Clusters of sequences that appear to be unique to the Gulf of Mexico were observed in both 100 and 200 m samples ( Figure A1A in Appendix).
The top BLASTx hits for all but 30 of 257 sequences obtained from accA amplicons were to carboxylase or carboxyltransferase genes from Archaea. The remaining 30 amplicons were most similar to non-Thaumarchaeota reference sequences with low (≤65%) sequence identities. Because they did not return hits to Thaumarchaeota reference sequences, we did not consider them further. Phylogenetic analysis of the inferred amino acid sequences for AccA ( Figure 6B) revealed three major OTUs: OTU 1 contained a majority of near-surface sequences (2, 50, and 100 m), while OTUs 2 and 3 contained mostly sequences from deep water (200 and 450 m). Analysis of accA nucleotide sequences revealed similar clusters with depth as inferred amino acid sequences for AccA and Thaumarchaeota rrs gene sequences (Figure A1B in Appendix) with a total of 51 OTUs observed at a 97% similarity cutoff ( Table A4 in Appendix). Some of these seem unique to the Gulf of Mexico (Figure A1B in Appendix), but this may be an artifact of the limited representation of accA sequences in reference databases.

PYROSEQUENCING: PHYLOGENETIC PATTERNS AND SAMPLE GROUPINGS
Microbial community composition varied dramatically with depth as shown by comparisons of libraries from surface (≤25 m depth) versus subsurface (≥100 m depth) samples ( Figure A2 in Appendix, Table A3 in Appendix; these data are discussed fully in King et al., 2013). Proteobacteria, especially αand γ-Proteobacteria, dominated the microbial community of nearsurface waters at most stations. Consistent with distributions of rrs and amoA indicated by qPCR analyses, Thaumarchaeota were greatly enriched in deeper waters. Only 14 (out of a total of 221,410) rrs sequences binned to AOB, confirming the much lower abundance of AOB relative to AOA found by qPCR quantification of amoA. Half of the AOB sequences were retrieved from one sample: MR1-2 m, taken upstream of the mouth of the Mississippi River with a salinity of 0. Only four Thaumarchaeota sequences were retrieved from this sample ( Table A3 in Appendix), two of which were most similar to the terrestrial thaumarchaeota, "Candidatus Nitrososphaera gargensis" strain EN76, at 15% similarity.
Sequences most closely related to NOB were retrieved from most samples (mean = 0.4%, range 0-1.8% of prokaryotes as calculated in "Methods" in Appendix, but assuming 2 rrs per NOB genome from Mincer et al., 2007). These sequences were primarily identified as Nitrospina sp. 3005 (AM110965), though Nitrospira ribotypes were also detected. The abundance of NOB rrs was greatest at depth (∼200 m, Table A3 in Appendix, Figure 7A) and was significantly correlated with the abundance of Thaumarchaeota in the same samples (Figure 7A; model II regression, n = 41, r 2 = 0.49, 95% CL of slope = 0.032-0.064). Euryarchaeota only accounted for a few percent of the microbial community (mean 5.8%, range 0.1-17.6%). Euryarchaeota were most abundant in near-surface samples (<100 m; Table A3 in Appendix) and their abundance was poorly correlated with the abundance of Thaumarchaeota ( Figure 7B; model II regression, n = 41, r 2 = 0.14, 95% CL of slope = 0.021-0.20).
UniFrac distances calculated between samples indicate significant (p ≤ 0.05) similarities in Thaumarchaeota rrs assemblages among offshore, near-surface samples and inshore, near-surface samples from Stations A2, A4, D3, E2, and MR2 (data not shown). The Station D5-100 m sample was assigned to the near-surface group (p ≤ 0.05) regardless of the method used to obtain rrs sequences (pyrosequencing versus Sanger sequencing from clone libraries). Among deep offshore samples, those from 160-950 m were similar to each other (p ≤ 0.05); sequences from clone libraries generated from Station D5-200 m were also included in this group. The phylogenetic composition of Thaumarchaeota rrs in the deepest sample, Station A6-1700 m, was only similar to samples from D5-900 m and F6-950 m (p ≤ 0.05).
Analysis of phylogenetic patterns across samples using PCoA in Fast UniFrac (Figure 9) revealed two major groups of pyrosequenced Thaumarchaeota rrs -one of deep (>100 m) samples and another including the near-surface samples (both inshore and offshore), which agrees with PCA groupings (Figure A3 in Appendix). The primary PCoA axis explained 70% of the variation in phylogenetic composition of the samples, with the secondary axis explaining an additional 11% (total 81%) of the variation. The  Triangles, near-surface (≤100 m) samples; squares, Deep (>100 m) samples. Lines are model II regressions (Legendre and Legendre, 1998) of all data. sample from Mississippi River Station MR1 was an outlier; however, PCoA analysis with this sample included revealed the same general pattern (Figure A7 in Appendix). Samples clustered using the minimum resampling of 2 sequences (Figure A4A in Appendix) only showed significant separation of Station MR1 sample from the rest of the samples (>99.9% jackknife support). For 100 re-sampled sequences (32 of 43 samples; Figure A4B in Appendix), a clear separation was observed between surface and deep samples (60% support) and between near-surface inshore samples (excluding Station A4) and near-surface offshore samples (>99.9% support). When the median number of sequences was applied to cluster analysis (520 sequences, 22 of 43 samples; Figure A4C in Appendix), the separation of deep and near-surface samples was statistically significant (>99.9% support). Station D3 (inshore, <100 m depth) samples clustered most closely (>99.9% support), followed by inshore Station A4-43 m and offshore Station A6-80 m (95% support). Amongst deep samples, a further separation was observed within the deep offshore samples, with the deepest samples (Stations D5-900 m and F6-950 m) and those from 350-760 m forming distinct clusters 50 and 61% of the time, respectively (Figure A4C in Appendix).

COMMUNITY COMPARISONS
We found a strong correlation between qPCR and pyrosequencing estimates of AOA relative abundance indicating that, despite potential biases associated with individual qPCR primers, qPCR estimates of Thaumarchaeota distributions at this coastal site are robust. Thaumarchaeota were abundant in deeper waters of the northern Gulf of Mexico, increasing in abundance with depth to a broad maximum between ∼200 and 800 m (Figures 2 and 3), coinciding with the oxygen minimum (Figure 2). Two shallow water stations (C1, 12 m; MR2, 8 m) contained up to 10 8 copies L −1 of Thaumarchaeota rrs; both of these stations are near the Mississippi River Plume, which may indicate an influence of riverine nutrients on AOA. It is important to note, however, that these are marine ribotypes and not terrestrial or freshwater ribotypes carried into the Gulf by the Mississippi River, since we did not retrieve similar ribotypes from Mississippi River sample MR1. In contrast, AOB amoA genes were below the limit of detection except in a few near-surface samples from inshore stations (Stations C1, D3, D5, G1, and H1) and in river stations MR1 and MR2. Consistent with many other studies of amoA in coastal water columns (Wuchter et al., 2006;Herfort et al., 2007;Beman et al., 2010), AOA amoA was always >10to 100-fold more abundant than AOB amoA. The relative abundance of Thaumarchaeota and AOB rrs in pyrosequenced libraries ( Table A3 in Appendix) is consistent with the distribution of amoA genes determined by qPCR, suggesting that the observed ratio of AOA:AOB amoA is not an artifact of primer bias. Although we do not have ammonia oxidation rate measurements for these samples, the greater abundance of AOA than AOB amoA suggests that Thaumarchaeota are likely to dominate nitrification in this region (Beman et al., 2008).
We did not quantify the distribution of NOB by qPCR (cf. Santoro et al., 2010, which is limited to Nitrospina); however, we were able to determine the distribution of all known NOB relative to Thaumarchaeota from pyrosequenced rrs libraries. We found that NOB abundance correlated well with that of Thaumarchaeota (r 2 = 0.49), as reported by others (Mincer et al., 2007;Santoro et al., 2010). The correlation between the distributions of these two groups suggests relatively tight coupling between them, presumably leading to efficient conversion of ammonia to nitrate in the northern Gulf of Mexico. However, NOB rrs abundance was only ∼5% of that of Thaumarchaeota (slope of model II regression; Figure 7A), in contrast to estimates of 20-100% reported by Mincer et al. (2007) or ∼25% reported by Santoro et al. (2010). This ratio would change if the rrs gene dosages we used in our calculations changed; however, the discrepancy suggests that alternative pathways, e.g., anammox, might be more significant for nitrite removal in the northern Gulf of Mexico than in the temperate Pacific upwelling zone sampled by Mincer et al. (2007) and Santoro et al. (2010).

ENVIRONMENTAL FACTORS
The connection between pH and AOA abundance has been examined closely in soils, where Archaeal amoA typically dominates in www.frontiersin.org more acidic samples (reviewed in Prosser and Nicol, 2008;Erguder et al., 2009). The Mississippi River plume is a site of respirationinduced acidification (Cai et al., 2011), and we observed a negative correlation between the abundance of Thaumarchaeota rrs and pH in our samples. In contrast, the abundance of Archaeal accA genes and of AOA amoA genes detected by the Francis primers was positively correlated with pH values (Figure 8). AOB amoA abundance was positively correlated with temperature and negatively correlated with depth, while AOA amoA abundance showed the opposite trends (Figure 8). These correlations correspond to AOB abundance being greatest in surface samples, versus AOA abundance being greater in samples from deeper, colder water, as observed in other studies (e.g., Santoro et al., 2010). We also observed a strong negative correlation between AOB amoA gene abundances and salinity, but we did not find a statistically significant (p > 0.05) correlation between AOA amoA genes and salinity. This contrasts with AOA distributions reported for sediments from an aquifer at Huntington Beach, CA, USA (Santoro et al., 2008) or from the San Francisco Bay Estuary (Mosier and Francis, 2008), where AOB were more abundant in high salinity sediments, while AOA were more prominent in low salinity environments.
Fluorescence (chlorophyll a) contributed significantly to PC1 (Figure A3 in Appendix) and accA, pSL12 rrs, and Archaeal amoA gene abundance (Francis primers) were all positively correlated with fluorescence in CCA analysis (Figure 8). Most other studies have reported inverse correlations between Thaumarchaeota abundance and chlorophyll a (Murray et al., 1999a,b;Wells and Deming, 2003;Kirchman et al., 2007). A study of AOA and AOB dynamics in estuarine sediments, though, showed that potential nitrification rates and the abundance of Archaeal amoA genes (Wuchter primers) correlated positively with sediment chlorophyll a concentrations (Caffrey et al., 2007). Archaeal abundance in the Arctic Ocean near the Mackenzie River mouth correlated FIGURE 8 | Canonical correspondence analysis (CCA) ordination plot of qPCR-estimated abundances for rrs, amoA, and accA genes and environmental data. The length and angle of arrows shows the contribution of a particular environmental variable to the CCA axes. Fluorescence, relative fluorescence, chlorophyll a equivalents; beam attenuation, turbidity. Eigenvalues, correlation values, and percentage variance for CCA are given in Table A6 in Appendix. positively with chlorophyll a (Wells et al., 2006), although a previous study at similar sites showed the opposite trend (Wells and Deming, 2003). We observed a strong positive correlation between Bacterial amoA abundance and turbidity in the Gulf of Mexico while Archaeal amoA genes were inversely correlated with turbidity (Figure 8). We detected greatest abundances of AOB amoA genes in shallow, near-shore waters (especially at Station C1 and all three Mississippi River stations), which may indicate a salinity effect or an association of AOB with particles originating from estuaries, coastal embayments, or the river. Since we did not sequence the AOB amplicons we obtained, we cannot use the phylogenetic position of the AOB to differentiate between these hypotheses (e.g., Phillips et al., 1999;O'Mullan and Ward, 2005). Caffrey et al. (2007) reported that AOB were more abundant than AOA in sediments from Weeks Bay, Alabama, a subembayment of Mobile Bay. Our near-shore waters also had higher ammonia concentrations (up to 3 µM; data not shown) than at other stations, which is consistent with the conceptual model that AOB are more competitive in environments with elevated ammonia concentrations (Martens-Habbena et al., 2009).
Oxygen concentrations are typically higher in surface than deep water, especially in this region of the Gulf of Mexico where bottom waters become seasonally hypoxic (Rabalais et al., 2002(Rabalais et al., , 2010. Although samples for this study were collected before hypoxia had fully developed ([O 2 ] ranged from 3.5 to 8.4 mg L −1 ; 150-375 µM), we found clades of AOA similar to those observed in other hypoxic waters (Beman et al., 2008;Labrenz et al., 2010;Molina et al., 2010). Additionally, we determined that the distribution of amoA phylotypes detected by the Francis primers correlated positively with [O 2 ] (as did Archaeal accA genes), while those detected by the Wuchter primers were not correlated with [O 2 ] (Figure 8). Our data suggests that these primer sets have different PCR biases such that certain AOA ecotypes are amplified more efficiently by one set than the other. As we observed correlations between different environmental variables and amoA phylotypes amplified by each primer, we believe these differences may reflect ecotype-specific sequence variation, as proposed for the two primer sets given in Beman et al. (2008).

amoA AND accA ABUNDANCE
The abundance of Archaeal amoA genes reported in this study (up to 10 8 copies L −1 ) is comparable to abundances reported for other continental shelf regions (Galand et al., 2006;Mincer et al., 2007;Kalanetra et al., 2009;Santoro et al., 2010), in the mesopelagic Pacific Ocean (Church et al., 2010), and in hypoxic zones (Beman et al., 2008;Molina et al., 2010). Differences in estimates of amoA abundance depended on the primer set used. Previous studies using the Wuchter primers reported low abundance of amoA relative to rrs in deep waters (Agogué et al., 2008;De Corte et al., 2009) compared to studies that used the Francis primers (Beman et al., 2010;Church et al., 2010;Santoro et al., 2010), suggesting that the Wuchter primers are biased against deep water clades of AOA. Our study supports these conclusions, but we also found that the Francis primers underestimated amoA abundance relative to rrs in surface water samples (Figure A6 in Appendix). Comparisons of primer sequences to alignments of amoA sequences from this study show single base-pair differences within Wuchter primer binding sites that could affect primer annealing and thus amplification (Figure A8 in Appendix). Our findings support the use of two different primer sets for the quantification of Archaeal amoA in near-surface versus deep water samples, as recommended by Beman et al. (2008). Alternatively, Thaumarchaeota abundance in DNA extracted from our samples estimated by qPCR of rrs agreed well with an independent assessment based on pyrosequencing. This suggests that the 334F/534R rrs primer set originally proposed by Suzuki et al. (2000) for quantifying Marine Group 1 Archaea may be more robust than amoA primer sets for quantifying Thaumarchaeota.
The accA gene, a proposed marker for archaeal autotrophy, was found at abundances almost equal to Thaumarchaeota rrs and amoA (amplified by the Francis primers) below 100 m depth ( Table 2), in agreement with findings from the original accA survey of the Tyrrhenian Sea (Yakimov et al., 2009). accA was least abundant in surface water samples (2-70 m depth; e.g., Figure 2), especially at inshore stations and in the Mississippi River. A similar trend has been reported for South China Sea samples, where accA approached the limit of detection in samples <100 m . Since the accA primers were designed using a very small database, the apparent discrepancy between accA and Thaumarchaeota rrs abundance in near-surface samples may be due to the presence of populations in surface waters with divergent accA that are not detected by this primer set.

COMMUNITY COMPOSITION
We identified a number of clades that appear to be unique to the northern Gulf of Mexico. These were seen in rrs genes from both clone libraries and pyrosequencing reads (e.g., D5- Since the global distribution of accA genes has not been thoroughly surveyed, it is difficult to determine whether these clades are indeed unique to the Gulf of Mexico. Generally, the sub-populations of Thaumarchaeota represented by distinct OTUs of each gene grouped according to sample depth, with the most stringent segregation by depth observed for rrs and accA, which segregated as deep (200 and 450 m) and near-surface (2, 50, and 100 m) OTUs, as has been observed elsewhere for amoA (Francis et al., 2005;Beman et al., 2008Beman et al., , 2010Kalanetra et al., 2009;Church et al., 2010;Santoro et al., 2010). Archaeal amoA phylotypes retrieved from Station D5 were also distributed according to sample depth (Figure 6A), with a near-surface "Group A" and deep "Group B" (Francis et al., 2005). Since these distributions of each of these genes were determined by independent PCR amplifications, it is not possible to directly associate rrs, amoA, and accA genotypes in our samples; however, the coincident groupings of these three markers of completely different physiological functions suggest differentiation of these Thaumarchaeota populations at a genomic level. Unifrac analysis suggests that Thaumarchaeota populations at these stations resolve into three sub-populations, segregated by depth and by factors covarying with depth, with strongest separation between surface (depth < 100 m) and deep water populations (Figure 9; Figures A4 and A7 in Appendix).
A few of the accA gene sequences retrieved from Station D5 clustered with previously defined ecotypes of the"Deep Water accA Clade" (Yakimov et al., 2009(Yakimov et al., , 2011, referred to here as Deep Ecotypes 1a, 1b, and 2 ( Figure 6B). Inferred amino acid sequences of all but 8 of the 87 accA amplicons we retrieved from 200 and 450 m grouped into Deep Ecotype 2. No representatives of Deep Ecotypes 1a or 1b were identified, although a group of more divergent sequences similar to these ecotypes was evident ( Figure 6B). Since previous studies concentrated on samples from deeper waters, we have added Near-Surface Ecotypes 1a and 1b to the "Shallow Water accA Clade" (Yakimov et al., 2011). Both of the Sargasso Sea reference sequences from this clade fit into Ecotype 1a, which contained only sequences from near-surface waters (≤100 m) of the northern Gulf of Mexico. The accA sequence from "Ca. Nitrosopumilus maritimus" SCM1 (Walker et al., 2010) grouped with marine sediment clones and with "Ca. Nitrosoarchaeum limnia" SFB1 (Blainey et al., 2011); we have thus allocated these sequences to a "Nitrosopumilus-like group."We also note a distinct lineage of accA (OTU 2, "Near-Surface Ecotype 1b"; Figure 6B) containing sequences from the northern Gulf of Mexico and the South China Sea ("Shallow group II" in Hu et al., 2011).The sequences we retrieved extend coverage of the diversity of accA environmental sequences to nearsurface sites and provide additional references for refining ecotype characterizations as more sequences are added to the databases.

CONCLUSION
AOA and Thaumarchaeota were abundant in the northern Gulf of Mexico coastal waters we sampled, accounting for up to 40% (qPCR) or 54% (pyrosequencing) of the total bacterioplankton population and outnumbering AOB by 10-to 100-fold. The ratio of AOA to NOB in our samples was lower than reported in other studies, suggesting that other pathways for nitrite oxidation may be more important in the northern Gulf of Mexico than elsewhere. A diverse community of Thaumarchaeota was observed at Station D5 near the Mississippi River plume in clone libraries constructed from archaeal genes of interest (rrs, amoA, and accA), with clades that seem to be unique to waters of the northern Gulf of Mexico. Consistent with this observation, and in contrast to studies of many other coastal waters, the amoA sequence most similar to Nmar_1500, the amoA gene from"Ca. N. maritimus"strain SCM1, was only 91% similar. Through analysis of rrs sequences generated using 454 pyrosequencing, we observed distinct clades of Thaumarchaeota that were distributed primarily by depth, with clear differences between near-surface (≤100 m) and deep (>100 m) populations. The distribution of rrs sequences in clone libraries generated from samples collected at Station D5 was consistent with this pattern, suggesting that parallel differences in the composition of Thaumarchaeota populations defined by other genes observed at this station were applicable to the rest of the northern Gulf of Mexico. Finally we found correlations between abundances of Thaumarchaeota genes in this region and environmental variables depth, temperature, turbidity, pH, and oxygen; however, the manner in which these variables influence Thaumarchaeota metabolism and thus distribution remains unclear.

ACKNOWLEDGMENTS
This work was supported through National Science Foundation Grants OCE 09-43278 and ANT 08-38996 to James T. Hollibaugh. Tag pyrosequencing was funded by an award from GoMRI-LSU to Gary M. King. We thank L. Powers for helping with sample collection and CTD data processing, and C. Fichot and H. Reader for assistance in collecting satellite data to determine euphotic depth. We would like to express our appreciation to the crew of the R/V Cape Hatteras, as well as the scientists from the GulfCarbon 5 cruise that provided environmental data for this study, especially W.-J. Huang and W.-J. Cai for access to pH data, and S. Lohrenz for access to nutrient data. Funding for the GulfCarbon project was from NSF through awards OCE 07-52110 (W.-J. Cai) and OCE 07-52254 (S. Lohrenz). Finally, we would like to thank the reviewers of previous versions of this manuscript for their helpful comments.

qPCR standards
Standards for qPCR reactions were constructed as in Kalanetra et al. (2009). Briefly, environmental DNA was amplified using gene-specific sequencing primers (Thaumarchaeota rrs, Archaeal amoA, Bacterial amoA) or qPCR primers (accA) under standard PCR conditions. For Bacterial rrs qPCR, E. coli genomic DNA was used. The resulting PCR product was loaded onto an agarose gel, electrophoresed, and a band of expected product size was excised. This band was purified using the QIAquick® Gel Extraction Kit (QIAGEN) and cloned into E. coli TOP10 chemically competent cells after insertion into a TOPO 4 vector (Invitrogen) using the manufacturer's instructions. Clones were selected at random and sequenced to check insert specificity. Those with positive insertions were grown overnight in LB broth with ampicillin, and plasmids were extracted using the QIAprep Spin Miniprep Kit® (QIAGEN). Plasmids were linearized using the restriction enzyme Not I (New England Biolabs), then purified in the same manner as PCR products above. Concentrations of linearized plasmid DNA were measured with the Quant-iT™ PicoGreen® dsDNA reagent (Invitrogen) using a Picofluor handheld fluorometer (Turner Designs). Gene concentration calculations were based on measured DNA concentrations, plasmid length, and insert sequence length. Standards were then diluted to a range of 10 7 -10 1 copies µL −1 for each reaction.

Thaumarchaeota hcd gene assay
In addition to accA, another gene in the 3-hydroxypropionate/4hydroxybutyrate pathway, hcd, encoding the enzyme 4hydroxybutyryl-CoA dehydratase, has been suggested as a potential marker for carbon fixation in Thaumarchaeota (Offre et al., 2011). Primers for this gene have been developed and tested on soil Thaumarchaeota populations (Offre et al., 2011). We explored using these primers to quantify hcd abundance in our samples. We were unable to obtain the desired amplification specificity with these primers and our samples (determined by agarose gel electrophoresis then cloning and sequencing putative amplicons, see below).

Gene abundance and ratio calculations
The number of gene copies detected by qPCR (copies per reaction) was converted to environmental concentrations (copies L −1 ) using the original sample volume filtered (∼1 L), the portion of the lysate purified (800 of 2000 µL), the final volume of the purified extract (50 µL, we also measured DNA concentration in this extract), and the portion of the purified DNA extract used in each qPCR reaction (2 µL). This calculation assumes that all bacterioplankton cells were collected on the filter, that the DNA contribution from eukaryotes was negligible, that all of the DNA from all of the cells collected on each filter was released into the lysate, then extracted and purified from the lysate and detected by qPCR with 100% efficiency by our methods (see discussion and calibration in Kalanetra et al., 2009). The contribution of Thaumarchaeota to the prokaryotic population was estimated from Thaumarchaeota and Bacteria rrs abundance by assuming 1.8 rrs per Bacteria genome (Biers et al., 2009), 1.0 rrs per Thaumarchaeota genome (IMG database) or 2.0 rrs per NOB genome (Mincer et al., 2007). Thaumarchaeota abundance was then divided by the total prokaryotic abundance (Bacteria plus Thaumarchaeota; Euryarchaeota were present in some samples but were never abundant, see below, and were not measured by qPCR) to calculate the contribution of Thaumarchaeota cells to the prokaryotic community. Ratios of gene abundance in a given sample were calculated directly from the qPCR data (copies µL −1 of extract).

BEST analysis
BEST analysis was performed for all samples collected in addition to the subset of samples for which nutrient data were available. Nutrient data were collected by researchers interested in modeling phytoplankton growth and thus were only available for near-surface samples. For these samples, gene abundances were log-transformed and resemblance distances for each gene between samples were calculated using Bray-Curtis similarity; resemblances for environmental data were calculated using the Euclidean distance. The resultant similarity matrices were combined and analyzed with Biota and/or Environment matching (BioEnv) through the BEST (Clarke, 1993) procedure in PRIMER (Clarke and Gorley, 2006). The significance of BEST results for each gene was tested using 999 permutations, and the null hypothesis of no species-environment relationship was rejected for all results with p ≤ 0.001.

Gene ratios
Ratios of archaeal amoA:Thaumarchaeota rrs ranged from 0.001 (B5-760 m) to 6.6 (G1-15 m) when using the Wuchter primers to quantify Thaumarchaeota amoA (Table 2; Figure A5B). Low ratios of amoA:rrs seemed to coincide with deep (>100 m) samples (Table 2; Figure A5A). In contrast, ratios of amoA:rrs ranged from 0.002 to 1.9 with an average of 0.5 when Thaumarchaeota amoA abundance was estimated using the Francis primers. The Francis primer set detected more amoA genes below 200 m depth, sometimes up to 1000 times more than the Wuchter primer set ( Figure A6). In contrast, estimates of amoA abundance in near-surface (≤100 m) samples using the Wuchter primers were 10 to 100-fold greater than estimates based on the Francis primers ( Figure A6). Ratios of accA:Thaumarchaeota rrs ranged from 0.0002 to 1.3 ( Table 2). We detected the fewest copies of accA per Thaumarchaeota rrs in near-surface (≤100 m) samples ( Figure A5B).

Thaumarchaeota hcd genes
hcd PCR products were also obtained using the primer set from Offre et al. (2011). However, the hcd primers yielded three bands of ∼200, ∼350, and ∼400 bp by agarose gel electrophoresis. Analysis of sequences from the ∼200 bp band indicated non-specific amplification, so these sequences are not considered further. Sequences from the ∼350 and ∼400 bp bands were most similar to hcd from Thaumarchaeota (BLASTx to the RefSeq database). Since nonspecific amplification prevented reliable qPCR quantification of hcd in our samples, we did not pursue this marker further. The sequences obtained from ∼350 and ∼400 bp bands have been submitted to GenBank (NCBI) under accession numbers KC409223 to KC409237.

Community composition
As expected, phylogenetic analysis of amoA nucleotide sequences ( Figure A1A) revealed more diversity than was apparent in inferred amino acid sequences, with 47 OTUs (97% similarity cutoff; Table A4) identified in the 100 and 200 m samples from Station D5. Seventeen of the 47 amoA OTUs only contained sequences from 100 m (1-22 sequences in each OTU), while 23 OTUs only contained sequences from 200 m (1-8 sequences in each). The amoA sequence most similar to either "Candidatus Nitrosopumilus maritimus" strain SCM1 or to Nitrosopumilus sp. NM25 was obtained from the 100 m library, and it was only 91% similar to either sequence.
The accA nucleotide alignment contained 51 OTUs (97% similarity cutoff; Table A4) that clustered primarily by depth ( Figure A1B). Sequences from deep samples (200 and 450 m) were assigned to 26 OTUs (1-13 sequences in each); only 4 of these OTUs contained any near-surface (2, 50, or 100 m) sequences. Twenty-one of the OTUs contained sequences exclusively from near-surface (≤100 m) samples (1-34 sequences in each). Almost half (101) of the sequences we retrieved were at least 77% similar to accA from "Ca. N. maritimus" strain SCM1; all of these were retrieved from near-surface waters except for six sequences from 200 m.

BEST analysis
Gene abundances determined by qPCR were compared to environmental data using the BEST procedure (Clarke, 1993). Results of this analysis (Table A7) show that abundances of Bacterial amoA, Archaeal accA, and pSL12 rrs -but not Bacterial rrs -were significantly correlated with fluorescence (chlorophyll a). Abundances of both Thaumarchaeota and Bacteria rrs were correlated with beam attenuation (turbidity), in combination with salinity and either fluorescence or temperature. Archaeal amoA abundance correlated with latitude, fluorescence, and salinity. Interestingly, BEST analysis (Table A7A) showed that amoA abundance estimates obtained using the Wuchter et al. (2006) primers correlated with temperature (ρ = 0.442; p < 0.001), while amoA abundance estimated with the Francis et al. (2005) primers correlated with oxygen concentration (ρ = 0.474; p < 0.001).
Nutrient data (including nitrite, nitrate, ammonia, phosphate, and silicate; provided by S. Lohrenz) were only available for nearsurface samples. Gene abundances for Bacterial rrs, pSL12 rrs, and Archaeal amoA amplified with Wuchter primers (Table A7B) correlated with silicate in combination with other variables, although only the Bacterial rrs result was significant (p < 0.001). Only the results with the highest Spearman's rank correlation coefficient (ρ) are shown in Table A7B; however, weaker correlations to nutrients were found with the second highest result. Archaeal amoA amplified with Francis primers (ρ = 0.446; p ≤ 0.010; data not shown) and Bacterial rrs (ρ = 0.583; p < 0.001; data not shown) were correlated with nitrate, while Thaumarchaeota rrs (ρ = 0.409; p ≤ 0.018; data not shown) also correlated with nitrite and silicate together.

Community composition
Almost all of the near-surface (≤100 m) Thaumarchaeota rrs sequences were ≥98% similar to the rrs from "Ca. N. maritimus" strain SCM1, as well as to Nitrosopumilus sp. NM25, retrieved from sand taken from a Zostera seagrass bed (Matsutani et al., 2011). The group containing these sequences included a sequence retrieved from cloned PCR amplicons sequenced from a tidal creek (the Duplin River) adjacent to Sapelo Island, Georgia (Hollibaugh et al., 2011), as well as "Ca. Nitrosoarchaeum limnia" strain SFB1 (Blainey et al., 2011), which was enriched from a sample taken in the oligohaline reach of North San Francisco Bay. This contrasts with clones recovered from 200 m in the northern Gulf of Mexico, where sequences were distributed among 9 OTUs, indicating a richer community of Thaumarchaeota (agreeing with the Shannon index of these samples calculated from pyrosequencing data; Table A4). We did not recover any clones related to the pSL12like clade at Station D5, which is consistent with their low rrs abundance as estimated by qPCR.
A nucleotide alignment of accA genes from this study produced a phylogenetic tree ( Figure A1B) that supported the groupings found in trees generated from inferred amino acid alignments ( Figure 6B); however, some samples from Station D5 (mostly from 200 m depth) clustered with representatives from Deep Ecotype 1a (Yakimov et al., 2011) at the nucleotide level. Additionally, a novel deep cluster of sequences from the Gulf of Mexico and the South China Sea was identified ("Deep Ecotype 3"; Figure A1B).

Gene ratios
High ratios of amoA:Thaumarchaeota rrs genes at certain stations ( Figure A5A; Table 2) could indicate a population of AOA with multiple amoA copies per genome or the presence of a group of Archaea that are not detected by the rrs primer set we used (e.g., Beman et al., 2008;Teske and Sorensen, 2008), but that contain a homolog of the amoA gene (for example the pSL12like clade). The latter seems less likely for pSL12 in particular, given the low abundance of rrs from this group at most stations in the northern Gulf of Mexico. However, in the Mississippi River at station MR1 (salinity of 0), the abundance of pSL12 rrs genes was equal to Archaeal amoA gene abundance, regardless of the amoA primer set used, while Thaumarchaeota rrs genes were undetectable. Low ratios of amoA:rrs have been proposed to indicate a potential for heterotrophy in Thaumarchaeota (Agogué et al., 2008;De Corte et al., 2009;Kalanetra et al., 2009);however, this has yet to be confirmed definitively and may simply reflect depth-dependent shifts in sub-populations that affect our ability to quantify them by qPCR, as shown by Beman et al. (2008) and others.
Our data indicate that the ratio of amoA:rrs gene abundance decreases with depth; however, we also observed increases in the accA:rrs ratio for deeper waters (Figure A5B). The amoA:accA:rrs ratios we found are not consistent with the expected 1:1:1 ratio found in the "Ca. N. maritimus" strain SCM1 genome (Walker et al., 2010). In samples ≤100 m, this ratio is 1.8:0.1:1 or 0.5:0.1:1, while deeper samples show 0.2:0.6:1 or 0.6:0.6:1 depending on whether the Wuchter or Francis amoA primer sets were used ( Table 2). In deeper waters where Thaumarchaeota rrs are most abundant, using the Francis primers produces ratios most similar to those found in "Ca. N. maritimus" strain SCM1. Direct comparison of amoA abundances in our samples as determined by the Wuchter versus Francis primer sets ( Figure A6) demonstrate this clearly.   Wuchter et al. (2006) and Francis et al. (2005) primer sets, respectively (as mentioned in text). # For primer 806R, N's in sequence = barcode sequence region.

Frontiers in Microbiology | Aquatic Microbiology
www.frontiersin.org        Values for all four canonical axes are shown, but only CCA1 and CCA2 were used to construct a biplot of the data (Figure 8).  (Clarke, 1993) performed with PRIMER v6 software (Clarke and Gorley, 2006). Archaeal amoA W, amplified with Wuchter et al. (2006)   www.frontiersin.org