Metagenomic Insights into the Uncultured Diversity and Physiology of Microbes in Four Hypersaline Soda Lake Brines

Soda lakes are salt lakes with a naturally alkaline pH due to evaporative concentration of sodium carbonates in the absence of major divalent cations. Hypersaline soda brines harbor microbial communities with a high species- and strain-level archaeal diversity and a large proportion of still uncultured poly-extremophiles compared to neutral brines of similar salinities. We present the first “metagenomic snapshots” of microbial communities thriving in the brines of four shallow soda lakes from the Kulunda Steppe (Altai, Russia) covering a salinity range from 170 to 400 g/L. Both amplicon sequencing of 16S rRNA fragments and direct metagenomic sequencing showed that the top-level taxa abundance was linked to the ambient salinity: Bacteroidetes, Alpha-, and Gamma-proteobacteria were dominant below a salinity of 250 g/L, Euryarchaeota at higher salinities. Within these taxa, amplicon sequences related to Halorubrum, Natrinema, Gracilimonas, purple non-sulfur bacteria (Rhizobiales, Rhodobacter, and Rhodobaca) and chemolithotrophic sulfur oxidizers (Thioalkalivibrio) were highly abundant. Twenty-four draft population genomes from novel members and ecotypes within the Nanohaloarchaea, Halobacteria, and Bacteroidetes were reconstructed to explore their metabolic features, environmental abundance and strategies for osmotic adaptation. The Halobacteria- and Bacteroidetes-related draft genomes belong to putative aerobic heterotrophs, likely with the capacity to ferment sugars in the absence of oxygen. Members from both taxonomic groups are likely involved in primary organic carbon degradation, since some of the reconstructed genomes encode the ability to hydrolyze recalcitrant substrates, such as cellulose and chitin. Putative sodium-pumping rhodopsins were found in both a Flavobacteriaceae- and a Chitinophagaceae-related draft genome. The predicted proteomes of both the latter and a Rhodothermaceae-related draft genome were indicative of a “salt-in” strategy of osmotic adaptation. The primary catabolic and respiratory pathways shared among all available reference genomes of Nanohaloarchaea and our novel genome reconstructions remain incomplete, but point to a primarily fermentative lifestyle. Encoded xenorhodopsins found in most drafts suggest that light plays an important role in the ecology of Nanohaloarchaea. Putative encoded halolysins and laccase-like oxidases might indicate the potential for extracellular degradation of proteins and peptides, and phenolic or aromatic compounds.


Supplementary Figures
Supplementary Figure 1. Maximum-likelihood tree of rhodopsins found on the >5kb contigs from the soda brine datasets and reference protein sequences. The number of rhodopsins found in the soda brine datasets within a certain functional group is given between brackets. Putative rhodopsins from viral reference genomes were pruned from the tree. Scale bar indicates 9% sequence difference.
Selected bootstraps values (100x) are shown at the nodes.

Supplementary Figure 2.
Relative frequency of GC% of a subsample of 1 million sequence reads from the Kulunda soda brines (Tanatar-5 (T5, Salinity S=17%), Picturesque Lake (PL, S=25%), Tanatar trona crystallizer (Tc, S=30%) and Bitter-1 (B1, S=40%) calculated in intervals of bin width 5 and compared to the GC% distribution of three metagenomes obtained by 454 pyrosequencing from saline environments with neutral pH; a crystallizer pond (SP-37, S=37%) and an intermediate saline pond S=19%) of the solar saltern in Santa Pola, Spain (Ghai et al., 2011) and the deep chlorophyll maximum in the Mediterranean Sea near Alicante, Spain (Med-dcm, 3.8% salinity; Ghai et al., 2010). The large peak in the GC-profile of SP-37 was attributed to the predominance of the square archaeon Haloquadratum walsbyi that has a low average GC-content compared to other members of the Halobacteria, such as those that dominate in the hypersaline soda brines.

Supplementary Table 10. Detected marker genes involved in carbon (C), nitrogen (N) and sulfur (S) metabolism from selected
Halobacteria-related draft genomes from a selection of draft genomes. Marker genes were selected according to Lauro et al. (2011) andLlorens-Marès et al. (2015). Haloferacaceae-, Halobacteriaceae-and Natrialbaceae-related draft genomes are marked in light grey, light blue and light pink respectively. Predicted protein sequences were selected by sequence based annotation and Pfam rule based annotation using the CAZymes Analysis Toolkit (CAT; E-value 0.001, Bit-score 55). Unless the Contig-ID is marked with an asterix ( * ), query and subject sequences had a consistent length and Pfam domain(s). Query Pfam domains and subject GenBank sequence identifiers (GI), as well as putative gene products of the besthits (Blast-N against; all e-values < 1E-05) and COG identifiers are given. CAZy families likely involved in polysaccharide degradation include chitinases (light green), cellulases and related glucanases (yellow), hemicellulases and related enzymes (dark red), glucoamylases and related GHs (orange). GH families likely involved in the hydrolysis of mono-and disaccharides are marked blue. Results classified under CAZy families GH 109 and GH 93 (encoding putative oxidoreductases and transcriptional regulators) respectively are not shown. 1 some enzymes belonging to this family might hydrolyse hemicellulose (e.g. xylanases), mono-or disaccharides.

Functional process
Haloferacaceae-, Halobacteriaceae-and Natrialbaceae-related draft genomes are marked in light grey, light blue and light pink respectively.