Shotgun metagenomic data reveals significant abundance but low diversity of “Candidatus Scalindua” marine anammox bacteria in the Arabian Sea oxygen minimum zone

Anaerobic ammonium oxidizing (anammox) bacteria are responsible for a significant portion of the loss of fixed nitrogen from the oceans, making them important players in the global nitrogen cycle. To date, marine anammox bacteria found in both water columns and sediments worldwide belong almost exclusively to “Candidatus Scalindua” species. Recently the genome assembly of a marine anammox enrichment culture dominated by “Candidatus Scalindua profunda” became available and can now be used as a template to study metagenome data obtained from various oxygen minimum zones (OMZs). Here, we sequenced genomic DNA from suspended particulate matter recovered at the upper (170 m deep) and center (600 m) area of the OMZ in the Arabian Sea by SOLiD and Ion Torrent technology. The genome of “Candidatus Scalindua profunda” served as a template to collect reads. Based on the mapped reads marine anammox Abundance was estimated to be at least 0.4% in the upper and 1.7% in the center area. Single nucleotide variation (SNV) analysis was performed to assess diversity of the “Candidatus Scalindua” populations. Most highly covered were the two diagnostic anammox genes hydrazine synthase (scal_01318c, hzsA) and hydrazine dehydrogenase (scal_03295, hdh), while other genes involved in anammox metabolism (narGH, nirS, amtB, focA, and ACS) had a lower coverage but could still be assembled and analyzed. The results show that “Candidatus Scalindua” is abundantly present in the Arabian Sea OMZ, but that the diversity within the ecosystem is relatively low.


INTRODUCTION
Anaerobic ammonium oxidation (anammox) is mediated by a specialized group of bacteria that have many unique properties including the synthesis of ladderane lipids (Damsté et al., 2002), the presence of a prokaryotic cell organelle (van Niftrik and Jetten, 2012), and a metabolism using reactive intermediates hydrazine and nitric oxide (Kartal et al., , 2013. Most anammox bacterial cultures have been enriched from wastewater treatment environments, and therefore much of anammox research is directed toward application of these bacteria in sustainable man-made treatment systems (Kartal et al., 2010). In addition, it has become clear that anammox bacteria contribute significantly to the loss of fixed nitrogen from both marine and terrestrial ecosystems (Kuypers et al., 2003(Kuypers et al., , 2005Zhu et al., 2013).
First studies on anammox bacteria diversity based on 16S rRNA gene sequences in suboxic waters, as well as marine and freshwater sediments, concluded that all environmental sequences were closely related to the "Candidatus Scalindua" genus and that the diversity was generally low in comparison with other systems, such as wastewater treatment plants (Woebken et al., 2007;Hu et al., 2010). However, another study by Woebken et al. (2008) revealed a significant microdiversity within the Scalindua genus by sequence analyses of 16S rRNA genes and the 16S-23S rRNA internal transcribed spacer. So far 16S rRNA, hydrazine synthase (hzsA), hydrazine dehydrogenase (hdh), and nirS genes have been used to estimate the diversity and abundance of anammox bacteria (Schmid et al., , 2008Li et al., 2011;Harhangi et al., 2012). The primers used for amplification of anammox 16S rRNA genes are hampered by their lack of specificity Woebken et al., 2008). In the most extreme case only 1 in 480 clones analyzed could be attributed to anammox in a sample from the Black Sea . NirS genes seem only be present in the genera Scalindua and Kuenenia genome assemblies, while Planctomycete KSU-1, Jettenia and Brocadia seem to employ a nirK type of nitrite reductase, limiting the use of nirS as biomarker for the detection of anammox bacteria (Gori et al., 2011;Hira et al., 2012;Hu et al., 2012). Hydrazine dehydrogenase genes are present in multiple divergent copies in each anammox genome assembly making their use as phylomarkers at least cumbersome (Schmid et al., 2008). Recently primers for the amplification of hydrazine synthase genes became available (Harhangi et al., 2012), and were shown to be quite specific and sensitive. Nevertheless, PCR primers are never better than the data they are based on and thus primer independent methods to study anammox diversity are highly desirable. Here we used shotgun metagenome data obtained by Next Generation Sequencing (NGS) of DNA from two depths in the Arabian Sea OMZ to retrieve non primer-biased information on the biodiversity of marine anammox bacteria in these ecosystems. The oxygen minimum zone (OMZ) of the Arabian Sea represents a globally important site for oceanic nitrogen loss (Lam and Kuypers, 2011;Pitcher et al., 2011). Its vertical distribution extends from approximately 150-2000 meters below sea level with very low oxygen concentrations, making it one of the most expansive and intense OMZ in the global ocean. Loss of fixed nitrogen from the Arabian Sea OMZ may occur via both denitrification and anammox (Ward et al., 2009;Jensen et al., 2011) although the contribution of each process may vary with season and location.
Earlier work using shotgun metagenomic and metatranscriptomic data has shown that anammox bacteria are abundant and active at the core of the Eastern Tropical South Pacific OMZ off Chile OMZ (Stewart et al., 2012), but likely underestimated anammox abundance due to the absence of a closely related reference sequence. Here, we used the recently available genomic template of "Candidatus Scalindua profunda" (van de Vossenberg et al., 2012) to assess anammox abundance and diversity through 16S rRNA gene reads and the marker genes hydrazine synthase and hydrazine dehydrogenase.

SAMPLING AND SAMPLE PREPARATION
Arabian Sea depth profiles are described in (Pitcher et al., 2011). Large-volumes of seawater (200-1700 L) were filtered through 142-mm diameter 0.2-µm polycarbonate filters (Millipore, Billerica, MA). Filters were cut into fragments prior to extraction. Cells were lysed by bead-beating with 1.5 g of sterile 0.1 mm zirconium beads (Biospec, Bartlesville, OK) in a extraction buffer containing 10 mM Tris-HCl pH 8, 25 mM Na 2 EDTA pH 8, 1% (v/v) sodium dodecyl sulfate (SDS), 100 mM NaCl, and molecular biology grade water. Samples were incubated at 70 • C for 30 min and then extracted with phenol-chloroform (Sambrook et al., 1989). After extraction, DNA was precipitated using ice-cold ethanol, dried, and re-dissolved in 100 µl of 10 mM Tris-HCl, pH 8. Total nucleic acid concentrations were quantified spectrophotometrically (Nanodrop, Thermo Scientific, Wilmington, DE, USA) and checked by agarose gel electrophoresis for quality. Extracts were kept frozen at −80 • C.

SOLiD library preparation and sequencing
Aliquots of DNA from station PA2 (170 m) and PA5 (600 m) were prepared for SOLiD libraries following manufacturer's instructions (Life Technologies, Carlsbad, CA, USA). In essence 5 µg genomic DNA (gDNA) was used as input material, sheared to ∼180 bp fragments, end-repaired, barcoded sequencing adaptors were ligated by blunt-end ligation. Finished libraries were analyzed by Bioanalyzer (Agilent, Santa Clara, CA, USA) and by Qubit (Life Technologies, Carlsbad, CA, USA) prior to use. Four libraries were pooled in equimolar amounts, and E80 ePCR was performed following manufacturer's instructions (Life Technologies, Carlsbad, CA, USA) with 0.7 pM final library concentration. Each E80 library pool consisting of 4 samples was finally sequenced on one SOLiD4 sequencing slide (Table A1). For PA2 one library was prepared while for PA5 two independent libraries were prepared to yield sufficient data, and run on a SOLiD0753. The 50 nt color coded reads were separated according to barcode and analyzed using the CLC genomics workbench (v4.9, CLCbio, Aarhus, Denmark) as described below.

Ion torrent library preparation and sequencing
All kits used in this section were obtained from Life technologies (Life technologies, Carlsbad, CA, USA). For both samples an identical library preparation was performed. Genomic DNA (extraction described above) was sheared for 7 min using the Ion Xpress™ Plus Fragment Library Kit following the manufacturer's instructions. Further library preparation was performed using the Ion Plus Fragment Library Kit following manufacturer's instructions. Size selection of the library was performed using an E-gel 2% agarose gel, resulting in a median fragment size of approximately 330 bp. Emulsion PCR was performed using the Onetouch 200 bp kit and sequencing was performed on an Ion Torrent PGM using the Ion PGM 200 bp sequencing kit and an Ion 318 chip.

Extraction and analysis of reads similar to "Candidatus scalindua profunda"
The genome of "Candidatus Scalindua profunda" was used as a template to harvest reads using read mapping as implemented in the CLC genomics workbench. Matching reads (mismatch penalty 2, In/Del penalty 3) and at least 80% identity over 50% of the read length were extracted. Subsequently, the identity of the reads was confirmed with a BLASTx analysis against the "Candidatus Scalindua profunda" proteins, using an E-value cutoff of 1 for SOLiD reads and 1e −4 for IonTorrent reads (Stewart et al., 2012). To assess diversity of "Candidatus Scalindua profunda" in the samples a single nucleotide variation (SNV) analysis was performed on consensus sequences of hydrazine synthase (scal_01317, hzsA) and hydrazine dehydrogenase (scal_03295, hdh) genes using the CLC genomics workbench. Consensus sequences were generated by iterative mapping of the Ion Torrent PA5 reads as described previously (Dutilh et al., 2009), using the CLC genomics workbench read mapper with mismatch penalty 2, In/Del penalty 3 and at least 80% identity over 50% of the mapping reads.
For 16S rRNA gene phylogenetic analysis, reads matching the "Candidatus Scalindua profunda"16S rRNA gene with a positive score (mismatch penalty 2, In/Del penalty 3) and at least 90% identity over 50% of the read length were extracted. These parameters result in an overestimation, thus subsequent phylogenetic assignment is required to remove false positives.

PHYLOGENETIC ANALYSIS
The phylogenetic affiliation of the partial bacterial 16S rRNA gene sequences affiliated to the Planctomycetes group were compared to release 111 of the Silva SSU Ref database [http://www.arb-silva. de; (Quast et al., 2012)] using the ARB software package (Ludwig et al., 2004). The partial sequences generated in this study were added to the reference tree supplied by the Silva database using the ARB Parsimony tool. Putative and annotated hzsA gene sequences were translated to protein and aligned by Muscle (Edgar, 2004) in Mega5 software (Tamura et al., 2011) and edited manually. Phylogenetic reconstruction of putative hydrazine synthase proteins was performed by maximum likelihood in PhyML v3.0 (Guindon et al., 2010) using the LG model plus gamma distribution including estimated amino acid frequencies (LG + G + F) indicated by ProtTest 2.4 (Abascal et al., 2005). Branch support was calculated with the approximate likelihood ratio test (aLRT).

SAMPLING SITES AND OVERVIEW OF SEQUENCING RESULTS
The Arabian Sea is one of the most expansive and intense OMZ in the global ocean, and presence of anammox bacteria through activity tests and molecular surveys has been documented previously (Jensen et al., 2011;Pitcher et al., 2011) We used gDNA from suspended particulate matter obtained at 170 m (station PA2), at the upper limit of the OMZ, and at 600 m depth (station PA5), at the core of the OMZ, that was collected during a sampling campaign in January 2009 (for details see Pitcher et al., 2011). The gDNA from both stations was subjected to SOLiD and Ion Torrent sequencing (Table 1) yielding 7.8 and 8.6 gigabases of information for station PA2 and PA5 respectively. At the upper station PA2 between 0.4 and 0.9 percent of the reads matched to coding regions of the "Candidatus Scalindua profunda" genome (van de Vossenberg et al., 2012), while at the core OMZ station PA5 between 1.7 and 3.5 percent of the reads matched. The estimated percentage of Scalindua sp. based on the SOLiD reads is probably an overestimate, as BLASTx analysis on the matching reads revealed that about 28% of those reads did not give a hit to a "Candidatus Scalindua profunda" protein (Table A1) using an E-value of 1. The longer Ion Torrent reads gave a more robust estimate, as only 0.3% of the reads did not match to corresponding protein in BLASTx searches using an E-value of 10 −4 . These abundance estimates are in good agreement with anammox abundance estimates reported for the upper and core regions of the chilean OMZ (Stewart et al., 2012). The total "Candidatus Scalindua profunda" gene coverage in station PA2 and PA5 was 13-fold (using 7.8 gigabases; Table 1) and 57-fold (using 8.6 gigabases; Table 1), respectively. The high number of reads matching "Candidatus Scalindua profunda" genes in the core of the Arabian sea OMZ (station PA5) made analysis of the diversity in 16S rRNA and core genes of the anammox metabolism possible.

PRESENCE OF CORE GENES OF ANAMMOX METABOLISM
On average, the core anammox genes listed in Table 2 had a higher (78.7 fold) coverage than the other genes (57 fold). Most notably hydrazine synthase (hzs) and hydrazine dehydrogenase (hdh) genes were covered up to no less than 155-fold in station PA5, and 35.9-fold in station PA2. Based on their high coverage in station PA5, hzsA and hdh were selected to assess diversity of "Candidatus Scalindua" at the core of the Arabian sea OMZ (see below).
The range of coverage of the octaheme hydroxylamine oxidoreductase hao genes was more variable with scal_02116 homolog barely detected at station PA2. The nitrate:nitrite oxidoreductase nxrAB and nitrite reductases nirS were well covered in both stations. Also genes encoding for nitrite (focA) and ammonium transport (amtB) were mapped by a high number of reads ( Table 2). The genes encoding for the carbon assimilation via the acetyl-coA pathway were well covered with formyl-tetrahydrofolate synthase at 153-fold coverage standing out. Taken together, the results show that genomic potential for anammox core metabolism is well represented in both the upper and core OMZ of the Arabian Sea, which coincides with the biogeochemical and lipid biomarkers data obtained in previous studies (Jensen et al., 2011;Pitcher et al., 2011).  Sequence reads affiliated to the "Candidatus Scalindua" cluster were closely related to sequences previously recovered from the Arabian Sea . First studies on anammox bacteria diversity based on 16S rRNA gene sequences in suboxic waters, as well as marine and freshwater sediments, concluded that all environmental sequences were closely related to the "Candidatus Scalindua" clade and that the diversity was generally low in comparison with other systems, such as wastewater treatment plants (Woebken et al., 2007;Hu et al., 2010). However, another study by Woebken et al. (2008) revealed a significant microdiversity within the "Candidatus Scalindua" group by sequence analyses of 16S rRNA genes and the 16S-23S rRNA internal transcribed spacer. In our study, by analyzing the sequence reads obtained from highthroughput sequencing methods, we avoided well-known PCR biases that might result in a misleading interpretation of the anammox bacteria 16S rRNA-based diversity. As seen in Figure 1, most of the reads are closely related to 16S rRNA gene sequences attributed to anammox bacteria previously recovered from the Arabian Sea . Some reads are also related to sequences recovered from other OMZ such as Namibia, Peru and Black Sea . But in general the diversity detected by 16S rRNA gene reads indicates a very low diversity within the studied anammox population.

ANAMMOX BACTERIA DIVERSITY BASED ON hzsA AND hdh GENES
To further investigate the diversity of the anammox population in the OMZ, the SNV in consensus sequences of the highly covered hzsA and hdh genes (scal_01318 and scal_03295 respectively) was analyzed. Since the distribution of the SOLiD reads was very non-uniform, only the Ion Torrent reads were taken into account in this analysis. The hzsA and hdh genes had 33 (1.4%) and 56 (3.7%) variable positions occurring in >20% of the reads respectively. Although Ion torrent reads are prone to homopolymer errors, both coding sequences contained only a single homopolymer stretch longer than four nucleotides (GGGGG in hzsA and AAAAAA in hdh). Manual inspection confirmed that no variant was called at these sites. Additionally it has been shown elsewhere that, although there is an increase in error rate for homopolymers stretches, it is minimal for stretches up to four nucleotides (Bragg et al., 2013;Ross et al., 2013).
The variable positions were distributed homogeneously over the entire length of the sequence (Table S1). Close inspection of the read mapping suggests that two strains of "Candidatus Scalindua" make up the majority of the anammox fraction of our sample. The higher number of SNVs in the hdh gene is possibly caused by the presence of two near-identical copies per anammox strain, as has been reported for other anammox bacteria (Strous et al., 2006;Hira et al., 2012).

Frontiers in Microbiology | Terrestrial Microbiology
February 2014 | Volume 5 | Article 31 | 6 Nunoura et al., 2013). The final PA5 consensus sequence was 83% identical at nucleotide level to the "Candidatus Scalindua profunda" reference and instead clustered with two clones obtained from the Barents Sea (95% identity) (Figure 2). BLASTn against the NCBI nt database confirmed that these Barents Sea sequences are indeed the closest relatives available. HzsA sequences seem to cluster according to environment, but it is worth noting that the number of environments sampled is very low. Differences in the sequence of the hzsA gene in PA2 (170 m) and PA5 (600 m) might indicate the upper and core areas of the OMZ water column are inhabited by different though related anammox populations. These two anammox populations could be segregated based on differences in physicochemical conditions in the Arabian Sea OMZ. Pitcher et al. (2011) reported similar oxygen and nitrite concentrations at 170 and 600 m depth (oxygen, 4.5 and 3.4 µM; nitrite, 0.6 and 0.52 µM, respectively), while ammonia concentrations were 2.5-fold higher at 170 m respect to 600 m depth, suggesting that the anammox population inhabiting the core of the OMZ could be limited in ammonia. The close relation of anammox genes is certainly influenced by the mapping methods used to retrieve the reads that match to the hzsA or hdh genes. All reads that are more than 20% different are excluded from the analysis. A much higher coverage and a much longer read length, and powerful calculating power would be needed to make better draft assemblies and BLASTx analysis to discover new divergent hzsA or hdh genes and would also be necessary to discover more divergent anammox species.

CONCLUDING REMARKS
NGS technology is a powerful method to obtain genomic information of marine anammox bacteria in OMZ ecosystems. The coverage by SOLiD and Ion Torrent was high enough to analyze the important hzsA and hdh core anammox genes. Together with 16S rRNA gene phylogenetic analysis, it was shown that the diversity of the marine anammox in the Arabian Sea was lower than previously determined by PCR methods.