Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags

Acid mine drainage (AMD) and mine tailing environments are well-characterized ecosystems known to be dominated by organisms involved in iron- and sulfur-cycling. Here we examined the microbiology of industrial soft coal slags that originate from alum leaching, an ecosystem distantly related to AMD environments. Our study involved geochemical analyses, bacterial community profiling, and shotgun metagenomics. The slags still contained high amounts of alum constituents (aluminum, sulfur), which mediated direct and indirect effects on bacterial community structure. Bacterial groups typically found in AMD systems and mine tailings were not present. Instead, the soft coal slags were dominated by uncharacterized groups of Acidobacteria (DA052 [subdivision 2], KF-JG30-18 [subdivision 13]), Actinobacteria (TM214), Alphaproteobacteria (DA111), and Chloroflexi (JG37-AG-4), which have previously been detected primarily in peatlands and uranium waste piles. Shotgun metagenomics allowed us to reconstruct 13 high-quality Acidobacteria draft genomes, of which two genomes could be directly linked to dominating groups (DA052, KF-JG30-18) by recovered 16S rRNA gene sequences. Comparative genomics revealed broad carbon utilization capabilities for these two groups of elusive Acidobacteria, including polysaccharide breakdown (cellulose, xylan) and the competence to metabolize C1 compounds (ribulose monophosphate pathway) and lignin derivatives (dye-decolorizing peroxidases). Equipped with a broad range of efflux systems for metal cations and xenobiotics, DA052 and KF-JG30-18 may have a competitive advantage over other bacterial groups in this unique habitat.


Two-step binning and reconstruction of metagenome-assembled genomes
A two-step binning procedure was used to obtain metagenome-assembled genomes (MAGs). In the first binning step, VIZBIN (Laczny et al., 2015) was applied to inspect assembled contigs (≥ 1000 bp). K-mer based ordinations were compressed using non-linear dimension reduction applying Barnes-Hut stochastic neighbor embedding. Clusters of contigs having the same taxonomic identity were extracted. In the second binning step, GC-content and coverage information were used to resolve potential heterogeneities within recovered clusters of contigs (Supplementary Figure 4). Sequences used for the assembly of the respective contigs were recruited using BBMAP and subjected to reassembly. Reassemblies were optimized using MINIMUS2 implemented in the amos software suite (v. 3.1.0, http://amos.sourceforge.net/wiki/index.php/AMOS). The resulting subsets of contigs were considered putative genome bins.

Statistical analyses
Significant differences between sampling sites (geochemical parameters, bacterial community profiles) were analysed by one-way ANOVA (analysis of variance), using Tukey-Kramer's test for post-hoc analysis (Tukey, 1949) and applying the correction for multiple comparisons according to Bonferroni. Multiple regression analyses to determine links between particular bacterial phyla and geochemical parameters of interest were carried out with log-transformed data. Results from these analyses were validated by Mantel tests for individual environmental parameters. Single correlation analyses between selected environmental parameters and bacterial species richness were done by calculating Pearson correlation coefficients. Correlations between either individual environmental parameters or sets of environmental parameters and microbial community structure were determined by relating environmental parameters to calculated Jensen-Shannon divergences using the anosim and bioenv functions implemented in vegan (Oksanen et al., 2016). Kendall's tau coefficients were used to link the occurrence of specific taxa within phyla of interest to selected environmental parameters. Determined p-values were corrected for multiple comparisons applying the method introduced by Benjamini and Hochberg (Benjamini and Hochberg, 1995).

FIGURES
Supplementary Figure 1. Early industrial alum leaching from soft coal. Alum-leaching from alum-rich soft coal was a four-step process (A). Alum-rich coal was mined (1), crushed, ground to smaller pieces and smoldered (2) yielding ash enriched in alum. The ash was leached out using water (3) and the leachate was subsequently boiled yielding crystalline alum (4). One ton of alum-rich coal yielded on average 50 kg of crystalline alum, and 300-400 kg of leached slag as by-product (Falk, 2002). Leached slag was dumped in the surrounding of former alum factories without any precautions or remediation. Slag deposits in the surrounding of former alum factories feature a distinct red colour and they show a strongly reduced vegetation (B). A closer inspection of dumped slag revealed the presence of combustion remnants and even fully intact pieces of coal (C).
Supplementary Figure 2. Phylum-level taxonomic profiles of metagenome datasets at different stages of analysis in comparison to community profiles derived from amplicon sequencing. The taxonomic affiliation of quality-controlled metagenome sequences (qcmg reads) was determined using KRAKEN (Wood and Salzberg, 2014). Full-length 16S rRNA sequences (mg rc 16S rRNA) were reconstructed with EMIRGE (Miller et al., 2011). Starting either from single-end (SE) or paired-end (PE) reads, assembled contigs were taxonomically binned applying PHYLOPYTHIA S+ (Gregor et al., 2016). Figure 3. Phylogenetic affiliation of Chloroflexi full-length 16S rRNA gene sequences. Sequences were reconstructed using EMIRGE (Miller et al., 2011). Orange-shaded clades refer to uncharacterized groups that have been particularly abundant in iTag sequencing. The presence / absence chart (RH1, RH2, RHP) indicates agreement with the iTag sequencing data, highlighting that the reconstruction of full-length 16S rRNA gene sequences was possible for the respective dataset. The NJ phylogenetic tree was calculated using ARB (v. 6.0.2) (Ludwig et al., 2004). The scale bar refers to 0.1 changes per nucleotide; NJ = neighbour-joining.

Supplementary
Supplementary Figure 4. Reconstruction of metagenome-assembled genomes (MAGs). A two-step binning approach was applied to recover putative genomes. Quality-controlled data was assembled using megahit  as iterative DeBruijn graph-based assembler applying a k-mer range from 33 -73 (step-size 10) (A). Contigs were taxonomically binned using both single-copy marker genes present on contigs and k-mer profiles applying PHYLOPYTHIA S+ (Gregor et al., 2016) (B). In the first binning step, VIZBIN (Laczny et al., 2015) was applied to inspect assembled contigs. K-mer based ordinations were compressed using non-linear dimension reduction applying Barnes-Hut stochastic neighbor embedding. Clusters of contigs having the same taxonomic identity were extracted (C). In the second binning step, GC-content and coverage information were used to resolve potential heterogeneities within recovered clusters of contigs (D). Figure 5. Recovery of microbial biomass. Bacterial 16S rRNA gene copy numbers were determined using quantitative PCR (n = 3), as outlined in the main text. The yield of total DNA from slag (RH1-RH3), pond (RHP) and forest soil (Ref) samples was estimated (n=3) by determining the dry weight of samples used for DNA extraction and relating the dry weight to the recovered amount of total DNA.

Supplementary
Supplementary Figure 6. Alpha-and beta-diversity analyses of slag bacterial communities. Chao1 and Shannon indices were calculated as measures for estimated bacterial species richness and evenness. Beta-diversity was assessed based on Jensen-Shannon divergences (Lin, 1991). Figure 7. The pH dependence of aluminium species and the toxicity of bioavailable aluminium. The availability of aluminium species is strongly related to pH (Piña and Cervantes, 1996). At neutral or near neutral pH, aluminium is mostly present in form of oxides, which are not bioavailable.

Supplementary
With decreasing pH, aluminium is increasingly present in its bioavailable and toxic form. Aluminium toxicity is mediated by a couple of mechanisms including, for instance, the interaction with the polar heads of membrane lipids and polar groups of membrane proteins. Both impair membrane transport processes (Schroeder, 1988;Zambenedetti et al., 1994) . Aluminium cations are known to have orders of magnitude higher affinity for ATP than magnesium (affinity of Al 3+ higher by a factor of 10 7 (Macdonald and Bruce Martin, 1988)), which can have fatal effects on ATP metabolism. Due to similar chemical properties aluminium can replace iron in iron-dependent enzymes of which many are found in the respiratory chain and the tricarboxylic acid cycle. Considering that aluminium is not redox-active, the replacement of iron with aluminium renders corresponding enzymes inactive.
Supplementary Figure 8. Potential aluminium tolerance / resistance mechanisms. Exemplary mechanisms include cell wall modifications, for instance the incorporation of glycosphingolipid instead of phospholipids and capsule formation (1). Both mechanisms prevent / restrict the uptake of aluminium cations. Chelation and sequestration by specific binding proteins, phospholipids, organis acids, amino acids and phosphate are additional mechanisms to cope with aluminium toxicity (2, 3). The fatal effect of aluminium on iron-dependent enzymes (e.g. various enzymes of the respiratory chain and the tricarboxylic acid cycle) makes metabolic bypassing necessary, meaning the usage of analogous, iron-free enzymes (4). Finally, general stress responses, such as efflux systems and environmental sensing, can play a role (Piña and Cervantes, 1996). ACN = aconitase, SDH = succinate dehydrogenase, FUM = fumarase (all three are examples of iron-dependent enzymes within the TCA). The assembly of read pairs and quality control were done using USEARCH (v. 7.0) (Edgar, 2010) as outlined above (1.4 Quality control and initial processing of amplicon sequencing data). 1 Percentages refer to the proportion of read pairs that could be merged to paired-end reads. 2 Percentages refer to the proportion of paired-end reads that passed QC. 3 OTU numbers are based on OTU clustering applying an OTU threshold of 97% sequence identity. OTU numbers are based on datasets that have not been trimmed for singletons or doubletons.  (Miller et al., 2011). Closest related sequences were deduced by SINA searches (Pruesse et al., 2012) against the SILVA database (Quast et al., 2013) and phylogenetic analysis in ARB (Ludwig et al., 2004). Information about the environmental origin were taken from available publications or metadata deposited with the sequence information. If possible, information about the taxonomic affiliation was deduced from both recovered 16S rRNA genes and a phylogenetic tree (maximum-likelihood tree with FASTTREE (v. 2.1.3)) (Price et al., 2010) based on a concatenated alignment of 31 single-copy marker genes identified with AMPHORA (v. 2.0) (Wu and Scott, 2012). Completeness and contamination of the MAGs were determined with CHECKM (v. 1.0.3) (Parks et al., 2015). MAGs labeled with a, b, or c resulted from a second binning step applied to resolve strain heterogeneity as outlined in the materials and methods. 16S = 16S rRNA gene, y = 16S rRNA gene recovered from putative genome, n = recovery of 16S rRNA gene from putative genome not possible.