Diversity of Phototrophic Genes Suggests Multiple Bacteria May Be Able to Exploit Sunlight in Exposed Soils from the Sør Rondane Mountains, East Antarctica

Microbial life in exposed terrestrial surface layers in continental Antarctica is faced with extreme environmental conditions, including scarcity of organic matter. Bacteria in these exposed settings can therefore be expected to use alternative energy sources such as solar energy, abundant during the austral summer. Using Illumina MiSeq sequencing, we assessed the diversity and abundance of four conserved protein encoding genes involved in different key steps of light-harvesting pathways dependent on (bacterio)chlorophyll (pufM, bchL/chlL, and bchX genes) and rhodopsins (actinorhodopsin genes), in exposed soils from the Sør Rondane Mountains, East Antarctica. Analysis of pufM genes, encoding a subunit of the type 2 photochemical reaction center found in anoxygenic phototrophic bacteria, revealed a broad diversity, dominated by Roseobacter- and Loktanella-like sequences. The bchL and chlL, involved in (bacterio)chlorophyll synthesis, on the other hand, showed a high relative abundance of either cyanobacterial or green algal trebouxiophyceael chlL reads, depending on the sample, while most bchX sequences belonged mostly to previously unidentified phylotypes. Rhodopsin-containing phototrophic bacteria could not be detected in the samples. Our results, while suggesting that Cyanobacteria and green algae are the main phototrophic groups, show that light-harvesting bacteria are nevertheless very diverse in microbial communities in Antarctic soils.


INTRODUCTION
Antarctica is nearly completely covered by ice, with only ∼0.32% of its surface ice-free. Although most of the ice-free regions are found in the Antarctic Peninsula and the Transantarctic Mountains, inland mountain ranges, such as the Sør Rondane Mountains (Dronning Maud Land), also represent an important fraction of the exposed surface area (Convey et al., 2008;Cary et al., 2010). The absence of vascular plants in continental Antarctica combined with the extreme environmental conditions have led to depleted soils with low availability of nutrients, especially of organic carbon, and nitrogen, and water (Kennedy, 1993;Cary et al., 2010;Tytgat et al., 2016). As a result, the mainly microscopic life in these areas (Cowan et al., 2014;Tytgat et al., 2014;Obbels et al., 2016) may thus be expected to use alternative energy sources to overcome these limitations. Sunlight, abundantly available during the austral summer, may be an important resource for certain members of the bacterial communities inhabiting exposed continental environments, and this should be reflected in the diversity of key genes for light-harvesting functions.
As life on Earth evolved, microorganisms developed different ways to harvest solar energy. Two main mechanisms have been described, either using rhodopsins or complex photochemical reaction centers that contain (bacterio)chlorophyll (Bryant and Frigaard, 2006). Early phototrophic prokaryotes (∼3.5 Giga annum ago) used reductants such as H 2 , Fe 2+ , or H 2 S for bacteriochlorophyll-dependent anaerobic anoxygenic phototrophy and did not involve oxygen (Hohmann-Marriott and Blankenship, 2011;Butterfield, 2015;Cardona, 2016). Later on (at least ∼2.4 Giga annum ago), oxygenic chlorophylldependent phototrophy, using H 2 O, arose in Cyanobacteria and played a key role in oxygenating the Earth's atmosphere (Butterfield, 2015;Cardona, 2016). Under these new atmospheric conditions, many of the anaerobic anoxygenic phototrophic bacteria may have disappeared from the now oxygenated habitats, although some groups adapted and embarked on an aerobic lifestyle (Koblížek, 2015). These aerobic anoxygenic phototrophic bacteria (AAP) were first reported in 1978 (Harashima et al., 1978) and are defined as aerobic species that synthesize bacteriochlorophyll and use light energy as an auxiliary energy source for their mostly heterotrophic metabolism (Feng et al., 2011a;Koblížek, 2015). Moreover, they do not contain carbon fixation enzymes (Yurkov and Csotonyi, 2009). Since their discovery nearly four decades ago, numerous AAP, predominantly belonging to the Proteobacteria, have been described from various habitats (Koblížek, 2015). Some species capable of aerobic anoxygenic phototrophy have also been found in the Gemmatimonadetes, Acidobacteria, and Chloroflexi. Phototrophic species belonging to the latter phylum are, however, not included in the AAP, as are many other aerobic bacteria that synthesize Bchl and perform anoxygenic phototrophy under aerobic conditions (e.g., phototrophic methylotrophs, Yurkov and Csotonyi, 2009;Zeng et al., 2014;Koblížek, 2015). The majority of all aforementioned anoxygenic phototrophs rely on a heterodimeric type 2 reaction center with pufL and pufM encoding the conserved proteins. Hence, these two puf genes are frequently used and convenient markers to study the diversity of anoxygenic phototrophic bacteria (Koh et al., 2011;Ritchie and Johnson, 2012;Koblížek, 2015).
Additionally, several other genes encoding subunits of key enzymes in the (bacterio)chlorophyll synthesis pathway are also well conserved among phototrophic microorganisms. All oxygenic and anoxygenic phototrophic bacteria use the darkoperative protochlorophyllide oxidoreductase (DPOR) enzyme complex, encoded by the chlLNB and bchLNB genes, respectively.
Besides phototrophy using photochemical reaction centers, a second type of phototrophy, employing rhodopsins also evolved (Bryant and Frigaard, 2006), although little is known about its origin in time. Microbial rhodopsins have been described in various groups, mostly in aquatic habitats, performing a range of functions, including light-driven ion pumping (Béjà and Lanyi, 2014;Boeuf et al., 2015). Although previously detected in Siberian permafrost (Petrovskaya et al., 2010), Antarctic sea ice, sea water, and continental lakes (Béjà et al., 2001;Koh et al., 2010;Qin et al., 2012;Williams et al., 2012;Yau et al., 2013;Do et al., 2014;Markowitz et al., 2014), little rhodopsin data are available for terrestrial Antarctica (based on metagenome data available on MG- RAST Wilke et al., 2016 andIMG Markowitz et al., 2014). Recently, a new family of proton pumping microbial rhodopsins, actinorhodopsins, has been discovered in freshwater Actinobacteria, (Sharma et al., 2008). To our knowledge, their occurrence in Antarctica has not been reported yet.
A first cloning survey of genes for phototrophic mechanisms in samples from the oligotrophic high-altitude soils near the Belgian Princess Elisabeth Station in the Sør Rondane Mountains revealed a high diversity of pufLM genes, whereas proteorhodopsin genes could not be amplified from any of the samples (Tahon et al., 2016). In this study, we aimed to more comprehensively assess the diversity of bacteria capable of exploiting sunlight as an alternative energy source. To further test the hypothesis that sunlight may be a very important resource for certain members of the bacterial communities inhabiting these exposed oligotrophic soils, an Illumina MiSeq paired-end 300 bp sequencing approach was used with primers targeting pufM, actinorhodopsin and bchL/chlL/bchX genes. For the latter, sequence data obtained in a previous study into the diversity of nifH genes (Tahon et al., under review), that were originally discarded because they lacked multiple of the conserved NifH amino acid residues, were revisited. Further analyses reported here identified these sequences as the NifH homologs BchL/ChlL and BchX, involved in the (bacterio)chlorophyll synthesis pathways (Fujita and Bauer, 2003;Raymond et al., 2004).

Study Site and Sample Collection
Four samples, previously used in the pilot survey (Tahon et al., 2016), were studied (Table 1), to allow comparison. During the Antarctic summer of 2009, top surface samples-mostly consisting of weathered granite parent material-were collected aseptically in the vicinity of the Belgian Princess Elisabeth Station (71 • 57 ′ S, 23 • 20 ′ E) at Utsteinen, Dronning Maud Land, East Antarctica. All samples were frozen at −20 • C upon collection. Sample KP2 was collected ∼1.3 km south of the research station. The three other samples were collected on the Utsteinen ridge, ∼500 m north of the Belgian base.

DNA Extraction
From each homogenized sample, 400 mg subsamples were taken in triplicate. Total genomic DNA was extracted and purified using the PowerLyzer R PowerSoil R DNA isolation kit (MoBio Laboratories) and a modified lysis protocol as instructed by the manufacturer. This extraction protocol was previously identified as the one yielding most bacterial diversity (Tahon et al., 2016). Following extraction, DNA was quantified using the Qubit R 2.0 fluorometer (Life Technologies) and stored at −20 • C until processing.

PCR and Preparation for Illumina Sequencing
A Veriti thermal cycler (Life Technologies) was used to amplify partial actinorhodopsin, pufM, bchL/chlL, and bchX genes. Primer selection was based on two criteria: (1) to amplify a broad diversity of the gene and (2) to produce an amplicon size suitable for Illumina MiSeq 300 bp paired-end sequencing ( Table 2). To complement the Nextera XT index kit (Illumina), primers were extended with an adapter.
For each of the soil samples, PCR was performed in triplicate on all three DNA extracts, for each primer set, resulting in a total of nine PCR products per sample per gene. PCRs were performed in 25 µl reaction mixtures containing 3 µl of genomic DNA (>6 ng µl −1 ), 1x Qiagen PCR buffer (Qiagen), 0.2 mM of each deoxynucleotide triphosphate, 0.625 U of Qiagen Taq polymerase (Qiagen), 100 mM bovine serum albumin and forward and reverse primer with final concentrations as shown in Table 2. All nine PCR products (three DNA extracts x three PCRs) were pooled and purified using the Ampure beads XT (Agencourt) protocol with slight modifications. Briefly, only 0.8 reaction volume of beads was used and DNA was resuspended in MilliQ water. Tagging of pooled PCR products was performed using the Nextera XT indices (Illumina) during an eight cycle version of the amplicon PCR with the indices replacing the primers. Afterwards, PCR products were purified as described above, with resuspension in Tris buffer (0.1 M, pH 8.5). Integrity and amplicon sizes of the PCR products were checked using a BioAnalyzer (Agilent), following quantification using a Qubit, as described above. Afterwards, samples were pooled equimolarly and sequenced on an Illumina MiSeq 300 bp paired-end platform (GATC). PhiX was spiked at 20% per lane.

Sequence Data Processing
For all genes, the forward and reverse sequencing reads were merged using the fastq_mergepairs command in USEARCH (Edgar, 2010) allowing a minimum overlap length of 8 nucleotides and a maximum of six mismatches in the overlapping region. For bchL/chlL/bchX, and pufM, merged sequences shorter than 370 and 200, and longer than 470 and 350, were removed using the fastq_minmergelen and fastq_maxmergelen commands, respectively. Primer sequences were removed from the merged sequences using cutadapt v1.8 (Martin, 2011), resulting in sequences with a minimum length of 193 and 321 bp, and a maximum length of 225 and 369 bp for pufM and bchL/chlL/bchX, respectively. Subsequently, during quality filtering using USEARCH, sequences with one or more nucleotides beneath the Phred Q20 threshold score and a maximum error >0.5 were removed from further analyses. Afterwards, all sequences were placed in reading frame +1, followed by removal of sequences showing no similarity to our genes of interest or containing stop codons and/or indels resulting in a frameshift. Detection of putative chimeric sequences was done using the Uchime model (default parameters, Edgar et al., 2011) in Mothur (Schloss et al., 2009). Finally, all remaining sequences were translated to proteins using MEGA 6 using the bacterial genetic code (Tamura et al., 2013).

Sequence Analyses
For pufM, an updated version of our previously described database containing publicly available sequences (Tahon et al., 2016) was used. For bchL/chlL and bchX a new database was assembled to contain all related sequence records from NCBI and IMG (https://img.jgi.doe.gov/) (Markowitz et al., 2012) available per November 15th 2015. Newly obtained nucleotide sequences and their derived protein sequences were added to   (Yutin et al., 2005) b From (Sharma et al., 2009) c From (Ando et al., 2005)  the databases using the import module of BioNumercs 7.5 (Applied Maths). For the sequences obtained with the primer set IGK3/DVV, to retain only BchL/ChlL (L subunit of DPOR) and BchX (X subunit of COR) sequences, NifH sequences were separated based on the presence of the conserved amino acid residues Ala43, the dipeptide Glu93-Pro94, Arg101, the dipeptide Ile104-Thr105 and Glu111 (positions based on the NifH protein sequence of Azotobacter vinelandii, accession number M20568, Fujita and Bauer, 2003). For phylogenetic analyses, all Illumina PufM sequences were clustered at a 95% cut-off using CD-HIT (Li and Godzik, 2006;Fu et al., 2012), grouping them into operational puf units (OPUs) of which one representative was used to construct the phylogenetic tree. BchL/ChlL and BchX sequences were grouped in operational BchL/ChlL units (OLUs) and operational BchX units (OXUs), respectively, at 95% cutoff. BchL/ChlL and BchX sequences were processed together for phylogenetic analyses. A first alignment was made with all sequences present in our databases, using Clustal Omega (Goujon et al., 2010;Sievers et al., 2011). Afterwards, alignments were trimmed to the size of our sequenced amplicons and visually inspected, excluding from further analysis all nonoverlapping reference sequences. Remaining sequences were realigned, after which the alignment was used to construct a maximum likelihood (ML) phylogenetic tree (1000 bootstrap replicates), by using the FastTree tree building software (Price et al., 2010) with the Whelan and Goldman evolutionary model and the discrete gamma model with 20 rate categories. From the resulting phylogenetic trees, closest relatives of our newly obtained sequences as well as representative sequences from the entire tree were selected to prepare a smaller tree representing the initial complete tree, following the same protocol. Sequences from uncultured bacteria were not included in the final tree. Trees were visualized using the iTOL software Bork, 2007, 2011) and related OPUs, OLUs, or OXUs were grouped into visual clusters that were named after cultivated bacteria that grouped in or close to the cluster. In the absence of cultivated members, the clusters were given an Utsteinen (UT) cluster number designation.

Statistical Analyses
For statistical analyses of PufM, BchL/ChlL, and BchX sequences, the Vegan package (Dixon, 2003) in R (https://cran.r-project.org/) was used. A non-normalized table of the total number of protein sequences per OPU, OLU, or OXU was used to perform rarefaction analyses and determine the total number of expected OPUs/OLUs/OXUs per sample and for the four samples combined. Parameters calculated for each dataset include species richness (Chao1) and evenness (Pielou). A normalized table (consensus of 10,000 iterations) of the total number of protein sequences per OPU, OLU, or OXU was also generated, to assess the impact on relative abundances.

Accession Numbers
Raw sequences were submitted to the NCBI sequence read archive under accession number SRP067116.

RESULTS
Actinorhodopsin genes could not be amplified from the samples and were therefore not included in the Illumina run. Sequencing and thorough quality control of sequence data resulted in 678940 high-quality pufM sequences (length 193-198 bp) and 119822 and 4950 high-quality bchL/chlL and bchX sequences (length 321-348 bp). At a 95% cut-off, the PufM protein sequences constituted 925 OPUs of which 248 were singletons. For BchL/ChlL and BchX, the sequences grouped in 207 OLUs and 48 OXUs, respectively. A total of 48 OLUs and 18 OXUs were singletons (Table 3).
When analyzing sequencing data, there is no consensus on inclusion or removal of singletons. Although many authors choose to remove them, particularly when the focus is on dominant community members (Pedrós-Alió, 2012), we have opted not to do so, for several reasons. Previous research has shown that singletons may be informative and valuable in reflecting rare and/or unique lineages of dormant or inactive bacteria that may grow when the right conditions are met (Logares et al., 2014). Removal of singletons would thus lead to a loss of power to detect these rare lineages in communities and lead to an underestimation of biodiversity levels (Kauserud et al., 2012;Zhan et al., 2013Zhan et al., , 2014Pascual et al., 2016). On the other hand, singletons may represent erroneous sequences and therefore the use of a very stringent quality control is required to accurately sort informative low-abundance sequence reads from errors and artifacts. We therefore implemented a very stringent quality control. The length of the pufM, bchX, and bchL/chlL amplicons (Table 2) allowed a large or even complete overlap between the forward and reverse sequencing reads. In the overlapping region, only six mismatches were allowed and every sequence with one or more nucleotides with a base call accuracy lower than 99% was discarded. Furthermore, since we studied protein-encoding genes rather than 16S rRNA genes, a number of additional quality control steps could be performed. The gene sequences were placed in frame +1, translated into amino acids and these were analyzed for the presence of stop codons, indels that result in a frameshift and presence of conserved sites. These steps allowed additional removal of erroneous sequences so that the number of singleton sequences was reduced a thousand fold and the final data are of much higher quality and likely represent real, though perhaps rare, diversity. Indeed, the vast majority of leftover singletons were found to group within the named clusters or close to sequences of named species (Figures 3, 4A,B, and Table 3, Table S1). As the goal of our study was to explore the (nearly) whole diversity of several protein encoding genes, including rare types, the number of singletons retrieved also aids to get more insight in the diversity coverage of the approach used. To estimate total diversity in a sample different parameters (e.g., Chao1, ACE) can be used (Gotelli and Colwell, 2010, Table 3). Calculation of these parameters takes the number of singletons into account and removal of singletons would confound the ability to estimate alpha diversity.  Table S1).
Frontiers in Microbiology | www.frontiersin.org  Figure S2). Clusters or separate OPUs/OLUs/OXUs containing less than 1% of the data were grouped together in the <1% group.
Frontiers in Microbiology | www.frontiersin.org 6 December 2016 | Volume 7 | Article 2026 As there is no consensus on the use of normalization which may lead to the omission of valuable diversity data (McMurdie and Holmes, 2014), we assessed how much diversity would be lost by normalizing. For each gene, the original OTU table was normalized 10,000 times. Based on these results, box plots were generated showing the large variation in the OTU diversity recovered after standardization ( Figure S1). Therefore, we created a consensus OTU table out of 10,000 standardizations and used this for calculating the relative abundances of reads and OTUs ( Figure S2). Surprisingly, differences with relative abundances calculated using non-normalized data (Figure 1) are limited (<0.05% for most of the OTUs and clusters). Normalization does, however, involve removal of sequence data and as a result leads to the discarding of (rare) diversity. This is most clear in the BchX dataset where the effect of normalization on sample KP43 is more pronounced as this sample had a very large number of reads and rare OTUs, many of which were removed. To avoid potential loss of rare sequence types, non-normalized data was used in the diversity assessments.
For PufM, of the 925 OPUs, 358 were detected in one sample only ( Figure S3, Table S1). The number of OPUs in each sample varied from 463 to 644 (Table 3). Of 228 shared OPUs, two contained the majority of PufM sequences and both grouped with PufM of heterotrophic alphaproteobacterial AAPs. OPU C1 contained 80.11% of all sequences and was 98.48% similar to the PufM sequence of Roseobacter denitrificans OCh 114 (Figure 1, Table S1). OPU C2 comprised 12.03% of all reads and had the PufM sequence of Loktanella sp. RCC2403 as closest match (96.97% amino acid similarity) (Figure 1, Table S1). Rarefaction analysis showed that, although the graphs started to flatten, saturation was not yet reached (Figure 2). This was corroborated by the number of estimated OPUs (Chao1) that was much higher than the number of observed OPUs (Table 3).
After ML analysis, nearly all 925 OPUs grouped into 28 clusters (Figure 3), mostly containing reads from all four terrestrial samples as well as PufM sequences originating from cultured bacteria ( Table 4). The Loktanella-like and Roseobacterlike clusters were the largest, containing 273 and 386 of the OPUs and, since they contained OPUs C2 and C1, they also contained most of the reads: 12.70 and 82.39% respectively. It should be noted that the cluster defined as Roseobacterlike also contains some PufM sequences of Tateyamaria, Erythrobacter, and Jannaschia (<0.1% of reads). The third largest cluster (PufM UT 5) grouped among less related alphaproteobacterial AAP PufM sequences and contained only 1.59% of the sequence data, mostly originating from sample KP43 (Figures 1, 3, Table 4). The other clusters and the separate OPUs each contained less than 0.85% of the reads. In the phylogenetic ML tree, our OPUs grouped with a broad variety of known PufM sequences originating from Alpha-, Beta-, and Gammaproteobacteria, and even Chloroflexi (Figure 3). Affiliations with alphaproteobacterial AAP PufM sequences, however, were most frequent. Several clusters (PufM UT 1-PufM UT 5) and separate OPUs did not group closely with known diversity, suggesting that many organisms harboring pufM genes still remain unreported. The PufM sequences appeared to be cosmopolitan: a broad diversity of cultured and uncultured sequences originating from habitats from all over the world, including polar regions, was found grouping with the new PufM sequences (Table S2). Some clusters, however, represented new PufM phylotypes, as no cultured or environmental PufM sequence was found grouping with them (e.g., cluster PufM UT 3).
Rarefaction analysis (Figure 2) showed saturation was not yet reached for the single, or the pooled samples, although the graphs started to flatten. Indeed, the estimated number of OLUs (Chao1 values) was only slightly higher than the number of observed OLUs, indicating most diversity was retrieved from the samples (Table 3). Furthermore, all samples showed a similar, very uneven distribution ( Table 3).
After ML analysis, the 207 OLUs could be grouped into 17 clusters, 12 of which grouped with BchL/ChlL sequences from known microorganisms, and three separate OLUs (C71, C107, and C115) (Figures 1, 4A). The clusters were dispersed all over the BchL/ChlL phylogeny although the majority of OLUs and reads grouped with ChlL from oxygenic photosynthetic organisms (Cyanobacteria and Trebouxiophyceae green algae) ( Table 4 and Table S1, Figures 1, 4A, and Figure S4). Notably, a small number of cyanobacterial reads from samples KP2 and KP15 (cluster Phormidesmis priestleyi Ana-like), together with sequences mainly obtained from marine unicellular Cyanobacteria ( Figure 4A and Table 4), grouped among proteobacterial BchL. This aberrant grouping was previously ascribed to the occurrence of several shared conserved signature indels, absent from ChlL of other Cyanobacteria (Gupta, 2012). The other clusters grouped with BchL of Chloroflexi and Proteobacteria or belonged to four clusters (BchL UT 1-4) that could not be assigned to a named phylum ( Figure 4A). OLUs grouping with Chloroflexi were only retrieved from sample KP15 ( Figure 4A and Table 4). Of the clusters grouping among proteobacterial BchL, most grouped with Alphaproteobacteria, some with Betaproteobacteria, and none was found grouping with Gammaproteobacteria ( Figure 4A).
No clear psychrophilic association could be deduced from the habitat metadata of the nearest neighbors. Most BchL/ChlL sequences grouped together with sequences retrieved from samples taken in a variety of ecosystems worldwide (Table S2).
A total of 4950 BchX sequences were obtained, many from sample KP43. They were binned at 95% protein similarity into 46 OXUs of which 31 were unique to one of the samples and 18 were singletons ( Figure S3, Table 3, and Table S1). No OXU was found common between all four terrestrial samples ( Figure S3). A single OXU, OXU C3, shared between samples KP15, KP43, and KP53, contained 82.10% of all BchX sequences and made up most of the reads retrieved from samples KP43 and KP53 (Figure 1, Table S1). However, phylogenetic analysis revealed that it did not group together with BchX sequences of known bacteria ( Figure 4B). The less abundant OXUs C2 (3.68%) and C32 (2.61%) grouped in the same cluster as OXU C3 (BchX UT 1) and contained nearly exclusively sequences from sample KP43 (Table S1). OXU C1 (5.72%), containing sequences from samples KP2, KP15, and KP43, grouped with the BchX sequence originating from Bradyrhizobium sp. S23321 (Table S1). All other OXUs represented less than 1% of BchX sequences.
Rarefaction analysis (Figure 2) indicated that saturation was not reached. The graph of sample KP53, however, started to flatten. This was also confirmed by the number of estimated OXUs, which was nearly identical to the number of observed OXUs for sample KP53 (Table 3). Evenness analysis resulted in very low values, except for sample KP2, which showed a more even distribution, although this could be explained by the very low number of sequences retrieved from this sample, grouping in seven OXUs (Table 3).
After ML analysis, 27 OXUs, of which 22 grouped in cluster BchX UT 1, could not be associated with a named phylum. The remaining 19 OXUs grouped with BchX sequences of 11 known bacterial taxa (Figure 4B) of Chloroflexi and Alpha-, Beta-, and Gammaproteobacteria, although the latter was represented by only one sequence (OXU C30) ( Figure 4B, Table S1). For samples KP43 and KP53, however, most of the sequences and OXUs grouped in two clusters (BchX UT 1 and BchX UT 2) or separate OXUs that could not be associated with a known BchX reference sequence (Figure 1, Table 4, and Table S1).
Similar to PufM and BchL/ChlL, no clear grouping of BchX sequences from cold habitats could be seen, as our sequences grouped with sequences retrieved from a broad diversity of terrestrial and aquatic ecosystems worldwide (Table S2).
To target a wider diversity of anoxygenic phototrophs using the photosynthetic type 2 reaction center, the more universal pufM_uniF/pufM_WAW primer set (Yutin et al., 2005) was used. Using our pufLM database assembled from publicly available sequences, an in silico comparison of these primers with other less degenerate primers (i.e., pufM_557F, pufMR and pufM_750R) used in other studies (Nagashima et al., 1997;Achenbach et al., 2001) clearly showed that the Yutin et al. primer set (Yutin et al., 2005) targets a much wider pufM diversity ( Figure S5). In addition, contrary to the clone library results (Tahon et al., 2016), the primer set used here gave successful amplification in all samples. As observed in the clone libraries, alphaproteobacterial-like PufM sequences were most frequently recovered (98.60%), followed distantly by gammaproteobacterial-like sequences (0.64%). Deep sequencing also revealed presence of some betaproteobacteriallike (0.012%) and even chloroflexi-like (0.38%) PufM (Figure 3). The dominance of alphaproteobacterial PufM sequences has been previously observed in the Arctic  and Antarctic (Karr et al., 2003;Koh et al., 2011), whereas chloroflexi-like PufM sequences have not previously been reported from polar or most other environments studied so far. Also in Arctic soils , gammaproteobacterial-like PufM sequences were found to contribute less to the general diversity.
The relative abundance of Roseobacter-like and Loktanellalike PufM in our samples is remarkably high (Figure 1). PufM from these two AAP-containing taxa have previously nearly exclusively been reported from marine and saline lake environments from polar and non-polar regions (Van Trappen et al., 2004;Oz et al., 2005;Du et al., 2006;Yutin et al., 2008;Cottrell and Kirchman, 2009;Jiang et al., 2010;Jeanthon et al., 2011;Koh et al., 2011;Ritchie and Johnson, 2012;Ferrera et al., 2014). The high relative abundance of Roseobacterlike PufM sequences (82.39%) in our terrestrial samples is therefore striking. Roseobacter has been found to be important in sulfur cycling in aquatic environments (González and Moran, 1997;Buchan et al., 2005;Wagner-Döbler and Biebl, 2006). However, the absence of Roseobacter 16S rRNA sequences in this study area  and in terrestrial Antarctic systems in general (based on metagenome data available in NCBI, MG-RAST, Wilke et al., 2016 andIMG/M Markowitz et al., 2014) suggests the presence of other microorganisms, containing PufM highly similar to that of Roseobacter, in our samples. The remainder of the PufM diversity in our samples, although recovered in small relative numbers, also mainly related to aquatic photoheterotrophic taxa. However, several of the PufM sequences recovered (e.g., Methylobacterium-like, Rhodopseudomonas-like) were highly similar to PufM reported from Arctic soils  or Chinese paddy soils (Feng et al., 2009(Feng et al., , 2011a. Thus, although aerobic anoxygenic phototrophy is frequently studied in aquatic environments, our data strongly suggest that this lifestyle may potentially be important in terrestrial ecosystems. The primers (Ando et al., 2005) previously used to amplify a broad diversity of nifH sequences also amplified structurally similar oxidoreductase subunits involved in (bacterio)chlorophyll synthesis (bchL/chlL encoding for the L subunit of DPOR in APB, Cyanobacteria, green algae, and lower land plants, and bchX encoding for a COR subunit of APB). The high relative abundance (98.26%) of cyanobacterial plus Trebouxiophyceae green algal ChlL suggested an important role for oxygenic photosynthetic organisms in our samples (Table 4, Figure 4A, and Figure S4). Notably, an inverse pattern was observed: nearly all of the ChlL reads recovered from samples KP2 (99.89%) and KP15 (98.92%) grouped with Chroococcales and Oscillatoriales Cyanobacteria, respectively (Table S1), with very few Trebouxiophyceae-like sequences (<0.12%). KP43 and KP53, on the other hand, contained far less cyanobacterial reads (0.58 and 2.96% respectively) and a very high relative abundance of Trebouxiophyceae-like ChlL (84.23 and 96.69% respectively) (Figure 1, Table S1). A similar pattern was observed previously (Tahon et al., 2016, under review): a high relative abundance of both cyanobacterial cbbL type IB (RuBisCO) and 16S rRNA, grouping with Chroococcales and Oscillatoriales Cyanobacteria, was recorded from the KP2 and KP15 samples, respectively, and much less in KP43 and KP53. A high relative number of trebouxiophyceael chloroplast 16S rRNA sequences was recorded from the latter samples (Tahon et al., under review).
However, as the primers were originally designed to amplify a broad diversity of nifH (Ando et al., 2005), and not bchL/chlL, it is conceivable that they might show a bias toward particular groups. Therefore, an in silico analysis, using a broad diversity of publicly available sequences, was performed to investigate possible primer bias. The IGK3 and DVV primers (Ando et al., 2005, Table 2) generally showed one or two mismatches, located at the 5 ′ primer end, with bchL/chlL reference data of different phyla ( Figure S5). Chlorobi sequences, however, showed most

FIGURE 4 | Continued
Frontiers in Microbiology | www.frontiersin.org FIGURE 4 | (A) ML phylogenetic tree (1000 bootstraps) of BchL/ChlL sequences. NifH sequences were used as an outgroup. Scale bar indicates 0.1 substitutions per amino acid position. OLU clusters (larger font size, labeled in gray) were named after cultivated bacteria that grouped in or close to the cluster. In the absence of cultivated members, clusters were given an Utsteinen (UT) cluster number designation. For clusters, the total branch lengths to the closest and farthest leaf of the cluster were used as sides of the triangle. OLUs not enclosed in clusters are labeled in bold. For reference data, taxon name and accession number is listed. Bootstrap values of at least 70% are displayed as circles with a diameter reflecting the height of the bootstrap value. BchX sequences are shown as a single cluster (details given in B). (B). BchX ML phylogenetic tree (1000 bootstraps). NifH sequences were used as an outgroup. Scale bar indicates 0.1 substitutions per amino acid position. OXUs not enclosed in a cluster are labeled in bold. OXU clusters (larger font size, labeled in gray) were named after cultivated bacteria that grouped in or close to the cluster. In the absence of cultivated members, clusters were given an Utsteinen (UT) cluster number designation. For clusters, the total branch lengths to the closest and farthest leaf of the cluster were used as sides of the triangle. For reference data, taxon name, and accession number is listed. Bootstrap values of at least 70% are displayed as circles. Smallest circles represent the lower cut-off of 70% with a diameter reflecting the height of the bootstrap value. BchL/ChlL sequences are shown as a single cluster (details given in A).
mismatches (four and one in DVV and IGK3, respectively). The absence of clear differences in primer specificity for different groups suggests there is unlikely to be extensive bias in the primers. Thus, the high relative abundance of oxygenic photosynthetic microorganisms in ChlL suggests they indeed appear to be an important phototrophic group in the investigated samples.
In addition to the highly recovered cyanobacterial and green algal ChlL, a small number of non-cyanobacterial BchL sequences (1.74%-2091 sequences, including 25 singletons) grouped with a broad diversity of mainly aerobic anoxygenic phototrophic bacteria primarily belonging to alphaproteobacterial taxa and to lesser extent betaproteobacterial and chloroflexi taxa ( Figure 4A).
The IGK3 and DVV primers also amplify bchX. Because this gene is not present in Cyanobacteria or in Trebouxiophyceae, the BchX dataset was relatively small (4950 sequences). The alternative explanation that primer mismatch might have reduced the number of sequences recovered, is less likely. Indeed, in silico analysis showed that the number of mismatches with a set of representative bchX sequences was limited ( Figure S5). The low number of sequences precludes firm conclusions regarding BchX diversity. The BchX sequences mainly grouped with alphaproteobacterial taxa and to a lesser extent with Betaproteobacteria, Gammaproteobacteria, and Chloroflexi (Figure 4B), however, with greater sequencing depth, relative abundances may change. Most of the BchX reads grouped into two clusters (BchX UT 1 and 2) without any close known representative, indicating the existence of multiple not yet cultured or recognized APB. Indeed many anoxygenic phototrophs can grow heterotrophically and it is thus possible that some taxa, originally described as regular heterotrophs, may have phototrophic capacities that have not been noticed. Indeed, Salinarimonas rosea DSM 21201 originally tested negative for bacteriochlorophyll a synthesis (Liu et al., 2010) and the phototrophic capacities of Skermanella stibiiresistens SB22 were originally not reported (Luo et al., 2012), whereas more recent analysis of their genomic sequences (accession numbers AUBC01000000 and AVFL01000000) revealed their phototrophic potential.
Finally, since BchL/ChlL, BchX, and NifH exhibit a high degree of protein sequence similarity (Raymond et al., 2004), and their genes can be retrieved using the same primer set, it may be a challenge to correctly annotate these sequences. During our analyses, we noticed that public databases contain several BchL/ChlL and BchX sequences annotated as NifH, and vice versa. When studying these genes and including reference data, it is therefore important to take into account gene specific conserved amino acid positions to ensure correct interpretation of data.
Comparing the datasets for different genes is difficult as datasets obtained with different primers cannot be compared directly because of differences in primer specificity, PCR efficiency or bias. Nevertheless, because the large sampling depth of PufM complicates evaluation, we have normalized PufM and BchL/ChlL datasets ( Table 3) to tentatively allow a rough comparison. This shows that pufM diversity is somewhat lower though of similar magnitude than that of . It should be noted that horizontal gene transfer may cause discrepancies between phylogenies of 16S rRNA or cbbL, and photosynthesis genes (Igarashi et al., 2000). However, tentatively, it can be noted that several of the APB taxa retrieved here were previously reported from 16S rRNA (e.g., Bradyrhizobium, Sphingomonas, Afifella, Methylibium) and cbbL (RuBisCO) (e.g., Mesorhizobium, Bradyrhizobium, Methyloversatilis, Rhodospirillum centenum) clone library and Illumina sequencing results from the same samples (Tahon et al., 2016, under review). Interestingly, the relative abundance of Bradyrhizobium-related sequences from sample KP15 was much higher than from the other samples in the three photosynthetic datasets (Table 4), as well as the 16S rRNA and cbbL datasets (Tahon et al., 2016) [Tahon et al., under review]. Furthermore, the genome of Bradyrhizobium sp. S23321-the closest neighbor to most of our Bradyrhizobium-related sequences-revealed a gene content adapted to survival in a broad range of environments (Okubo et al., 2006). The combination of these data thus suggests photoautotrophic bradyrhizobia may be present in sample KP15.
In the pilot study proteorhodopsin could not be detected (Tahon et al., 2016). Actinorhodopsin, a similar lightdriven proton pump, was originally retrieved from aquatic Actinobacteria (Sharma et al., 2008) and little is known about its presence in terrestrial environments. Because Illumina data of partial 16S rRNA genes previously showed the samples investigated here to contain diverse Actinobacteria (6.26-23.48% of reads and 20.73-34.60% of OTUs, Tytgat et al., 2016) [Tahon et al., under review], we used several primer sets ( Table 2) to amplify actinorhodopsin genes from our samples. While our attempts failed, this does not necessarily imply that these systems are absent in the terrestrial Antarctic bacterial communities. Currently available primers may be unsuitable to capture all actinorhodopsin diversity, as most reference data originates from aquatic systems (Sharma et al., 2008(Sharma et al., , 2009Wurzbacher et al., 2012;Jezberová et al., 2013;Salka et al., 2014). Future metagenome datasets may resolve this question.

CONCLUSIONS
We studied the presence of bacterial phototrophic pathways in a terrestrial Antarctic environment. While we could not detect actinorhodopsin genes, our analysis of other genes showed that a broad variety of oxygenic and anoxygenic phototrophs is present in soils of the Sør Rondane Mountains, East Antarctica. The high relative abundance of oxygenic photosynthetic microorganisms, however, suggests they are an important phototrophic group. Sequencing results of BchL, ChlL, and BchX, involved in (bacterio)chlorophyll synthesis, were dominated either by Cyanobacteria-or Trebouxiophyceaerelated sequences. Moreover, the presence of currently unknown non-cyanobacterial phylotypes suggests the existence of multiple not yet cultured or recognized anoxygenic phototrophic bacteria. Illumina Miseq sequencing of PufM, typical for light-harvesting bacteria with a type 2 reaction center, revealed a very high relative abundance of two groups of sequences, i.e., Roseobacter-like and Loktanella-like, and a large diversity of other less abundant taxa from Alpha-, Beta-, and Gammaproteobacteria, Chloroflexi and several unassigned groups. Although photoheterotrophic bacterial light-harvesting is nearly exclusively studied in aquatic environments, our results suggest the potential relevance of this mechanism in terrestrial ecosystems.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: GT, AW. Performed the experiments: GT. Analyzed the data: GT, BT. Contributed analysis tools: BT. Wrote the paper: GT, AW. All authors approved the final manuscript.