The undiscovered biosynthetic potential of the Greenland Ice Sheet microbiome

The Greenland Ice Sheet is a biome which is mainly microbially driven. Several different niches can be found within the glacial biome for those microbes able to withstand the harsh conditions, e.g., low temperatures, low nutrient conditions, high UV radiation in summer, and contrasting long and dark winters. Eukaryotic algae can form blooms during the summer on the ice surface, interacting with communities of bacteria, fungi, and viruses. Cryoconite holes and snow are also habitats with their own microbial community. Nevertheless, the microbiome of supraglacial habitats remains poorly studied, leading to a lack of representative genomes from these environments. Under-investigated extremophiles, like those living on the Greenland Ice Sheet, may provide an untapped reservoir of chemical diversity that is yet to be discovered. In this study, an inventory of the biosynthetic potential of these organisms is made, through cataloging the presence of biosynthetic gene clusters in their genomes. There were 133 high-quality metagenome-assembled genomes (MAGs) and 28 whole genomes of bacteria obtained from samples of the ice sheet surface, cryoconite, biofilm, and snow using culturing-dependent and -independent approaches. AntiSMASH and BiG-SCAPE were used to mine these genomes and subsequently analyze the resulting predicted gene clusters. Extensive sets of predicted Biosynthetic Gene Clusters (BGCs) were collected from the genome collection, with limited overlap between isolates and MAGs. Additionally, little overlap was found in the biosynthetic potential among different environments, suggesting specialization of organisms in specific habitats. The median number of BGCs per genome was significantly higher for the isolates compared to the MAGs. The most talented producers were found among Proteobacteria. We found evidence for the capacity of these microbes to produce antimicrobials, carotenoid pigments, siderophores, and osmoprotectants, indicating potential survival mechanisms to cope with extreme conditions. The majority of identified BGCs, including those in the most prevalent gene cluster families, have unknown functions, presenting a substantial potential for bioprospecting. This study underscores the diverse biosynthetic potential in Greenland Ice Sheet genomes, revealing insights into survival strategies and highlighting the need for further exploration and characterization of these untapped resources.


Introduction
Bacteria are capable of producing a myriad of chemical structures that are beneficial to their own survival (Gavriilidou et al., 2022).Some of these natural products have antimicrobial activity and are employed to wage chemical warfare with competing microorganisms (Ghoul and Mitri, 2016).Therefore, as the world searches for novel antimicrobial therapeutics in light of the antimicrobial resistance crisis (Murray et al., 2022), there is a large interest in the chemical diversity of microbially produced natural products.In addition to antibiotics, the chemical diversity of microbial natural products has inspired numerous anticancer drugs (Cragg and Newman, 2013) and antivirals (Ma et al., 2020), further illustrating the value of drugs inspired by natural biosynthesis.In addition to therapeutics, natural products can inspire the synthesis of many other bioactive compounds (Cordier et al., 2008).
Under-explored extremophiles from cryospheric environments may harbor an untapped reservoir of chemical diversity due to their unique adaptations to extreme conditions, such as adaptations to frequent freeze-thawing conditions, high UV radiation and low nutrient levels.Glaciers and ice sheets can harbor a diverse range of microorganisms.The bacterial community of the cryosphere is dominated by Proteobacteria and Bacteroidetes (Bourquin et al., 2022).The average microbial abundance in surface meltwaters is regionally consistent, about 10 4 cells mL −1 (Stevens et al., 2022).Glaciers and ice sheets are thus considered biomes, which are mainly microbially driven (Anesio et al., 2017).Over 200 natural products have been discovered already from polar organisms (Tian et al., 2017).Furthermore, predicted novel chemical diversity has, for example, been found in actinomycetes from polar marine soil (Soldatou et al., 2021), isolates from the Canadian Arctic (Marcolefas et al., 2019), and genomes from Tibetan glaciers (Liu et al., 2022).
In recent years, the Next Generation Sequencing (NGS) revolution (Reuter et al., 2015) has made it cheaper and easier to gain access to biosynthetic potential through genomic information.Third-generation sequencing technologies (PacBio, Oxford Nanopore Technologies) have subsequently revolutionized the field, and are now crucial for deciphering positions, orientations, etc., during the sequencing of complete genomes (Athanasopoulou et al., 2022).Metagenome sequencing can be used to obtain metagenome-assembled genomes (MAGs) without the need to have the organism in culture.Furthermore, a range of bioinformatics genome mining tools are available to detect and analyze gene clusters responsible for the production of natural products (Tracanna et al., 2017;Medema et al., 2021).The "antibiotics and secondary metabolite analysis shell" (antiSMASH) is the most used tool for detecting biosynthetic gene clusters (BGCs) within microbial genomes (Blin et al., 2021).Using manually curated rules, antiSMASH is able to identify a range of different biosynthetic pathways and provides the regions they are encoded in as output.Tools like the "biosynthetic gene similarity clustering and prospecting engine" (BiG-SCAPE) (Navarro-Muñoz et al., 2020) facilitate subsequent exploration of large datasets, for instance from mining a large number of genomes, by enabling the construction of sequence similarity networks of the resulting BGCs from thousands of genomes at once.The MIBiG (Minimum Information about a Biosynthetic Gene cluster) repository can subsequently be used for comparison of these BGCs to ones previously characterized experimentally (Terlouw et al., 2023).
Despite these advancements, it is speculated that a substantial portion of the chemical diversity present in nature remains undiscovered (Scott and Piel, 2019).Many organisms, representing diverse branches on the tree of life, have not yet been successfully cultured in the laboratory (Hug et al., 2016).Although over 50,000 MAGs from various regions around the world have been cataloged (Nayfach et al., 2020), genomes of microbes from polar environments are still under-represented.
The Greenland Ice Sheet is a microbially driven cryospheric habitat containing potentially under-explored biodiversity.In summer, the melting ice surface contains a proliferation of eukaryotic glacier ice algae, which causes the ice surface to darken by virtue of their pigmentation (Cook et al., 2020).The pigmented glacier ice algae, Ancylonema alaskanum and Ancylonema nordenskiöldii, are the main species driving this biological albedo reduction (Lutz et al., 2018).However, alongside the glacier ice algae, a diverse community of bacteria, fungi, and viruses is associated with the algal blooms.Proteobacteria, Actinobacteria, and Bacteroidetes are typically the dominant bacterial phyla (Jaarsma et al., 2023), and Chytridiomycota fungi potentially play an important role in infecting and decomposing of glacier ice algae (Perini et al., 2022).Furthermore, cryoconite holes and snow form distinct habitats housing unique microbial communities (Anesio et al., 2017).
This study aims to investigate the biosynthetic potential present in bacterial genomes derived from supraglacial habitats on the Greenland Ice Sheet.Doing so, we seek to shed light on the potential to produce antimicrobials, as well as other natural products, that may aid their survival in the cryosphere.A genome collection was previously obtained using culturing-dependent and -independent methods, including 133 high-quality MAGs and 28 isolate draft genomes (Jaarsma et al., 2023).Here, we describe the production of complete closed genomes from those isolates, in order to mine them together with our collection of MAGs.The distribution of biosynthetic gene clusters (BGCs) was analyzed across different sample types (surface ice, cryoconite, biofilm, snow), and a comparison was made between isolates and MAGs.Additionally, the presence of BGCs in various phyla was examined to identify organisms with high potential for natural product production.

Materials and methods . Sample collection
Sample collection was carried out during the July-August 2021 Deep Purple expedition (https://www.deeppurple-ercsyg.eu/).The camp was established in the south of the Greenland Ice Sheet, situated approximately 7.5 km from the margin at coordinates 61.10138895, -46.8481389, and at an elevation of 617 m a.s.l.(Figure 1A).
Environmental samples from both the dark ice surface and cryoconite holes were gathered over a 100 × 100 m area (Figures 1B-D).Ice samples were obtained by scraping roughly two vertical cm of the dark ice surface from 30 patches within the sample area using an ice axe.These samples were stored in sterile .The map shows the average albedo between --and --generated using a m harmonized satellite albedo (Feng et al., ).4L Whirl-pak bags and allowed to melt at an ambient temperature of 5-10 • C. Cryoconite sediment was collected from 30 different cryoconite holes within the designated area using a polycarbonate aquarium pipette.A total of 18 kg of surface ice and 3.5 kg of cryoconite sediment were collected.The ice and cryoconite samples were combined and homogenized within their respective sterile bags.Subsamples of both ice and cryoconite samples were taken in 50 mL tubes and kept cool during transportation to the lab.Three 4 mL cryotubes were filled with cryoconite sediment for DNA extraction.In the case of ice samples, biomass for DNA extraction was obtained by filtering three technical replicates of 500 mL of melted ice through Sartorius cellulose nitrate filters (0.2 µm).These filters were stored in 4 mL cryotubes after being rolled up.Additional samples, including a viscous suspended biofilm from a cryoconite hole and a red snow sample, were also collected.These samples were gathered in 50 mL tubes and kept cool during transportation back to the lab.A subsample of the biofilm was taken for DNA extraction.All samples designated for DNA extraction were frozen in the field camp and maintained at -20 • C during transport to Aarhus University, Roskilde, Denmark.Bacteria were isolated in axenic cultures from cryoconite, ice, biofilm, and snow samples using Petri dishes and in situ culturing setups [described in detail in Jaarsma et al. (2023)].Cultures were maintained on Reasoner's 2 agar (R2A) (Linde et al., 2000) (Alpha Biosciences) at 5 • C.

. Metagenome sequencing
DNA extraction for shotgun metagenome sequencing was carried out on one biofilm sample and three technical replicates of cryoconite sediment and filters containing ice biomass using a DNeasy PowerLyzer Power Soil kit (Qiagen).The seven libraries were generated using the Ultra FS II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, USA) in accordance with the manufacturer's protocol.These libraries were equimolarly combined and examined for insert size distribution and the presence of adapter-primer dimers using a TapeStation 4150 with a D1000 DNA ScreenTape (Agilent Technologies).The library was diluted and denaturated following Illumina's recommendations, before being sequenced on a NextSeq 500 using the high-output flow cell and the v2.5, 300-cycle chemistry.The sequencing process yielded 140 gigabases that met the Q30 threshold.
. Nanopore whole-genome sequencing DNA extraction was performed using a Gentra Puregene kit (Qiagen, Hilden, Germany), following the manufacturer's protocol, except for the DNA hydration solution.Instead, 10 mM Tris, 50 mM NaCl pH 8.0 was used.The quality of the DNA eluates was measured on a Nanodrop 2000 spectrophotometer (Thermo Scientific) and the quantity was measured on a Qubit4 fluorometer (Thermo Scientific) using the BR DNA assay.
The corrected long reads were used for de novo whole genome assemblies with Flye (Kolmogorov et al., 2020) under default settings utilizing the -nano-corr flag.The assembly graph files of the genomes were visualized using Bandage (Wick et al., 2015).The assembled genomes were inspected for completion using BUSCO (Simão et al., 2015) and its generalized bacteria_odb10 (2020-03-06) database, and were annotated using PROKKA (Seemann, 2014) under default settings.
The Type strain Genome Server (TYGS) was used to identify close relatives of the resulting nanopore genomes and MAGs, where possible (Meier-Kolthoff and Göker, 2019).Alternatively, full-length 16S rRNA genes were used to identify closest relatives in the NCBI database using BLASTn.
. Genome mining workflow . .BGC detection Biosynthetic Gene Clusters (BGCs) were detected using a local version of antiSMASH version 6.1.1 (Blin et al., 2021).No gene-finding tool was used for isolate genomes as they were already annotated.MAGs were annotated using prodigal-m.Full parameters can be found in the Github repository (https://github.com/AU-ENVS-Bioinformatics/GR21_genome_mining).

. . Similarity network
BiG-SCAPE version 1.1.2(Navarro-Muñoz et al., 2020) was used to construct a similarity network of the predicted BGCs, including singletons, together with the MIBiG dataset (Terlouw et al., 2023).Gene cluster families (GCFs) and gene cluster clans (GCCs) were created using the default 0.30 and 0.70 cutoff values, respectively.

. Data handling and visualization
AntiSMASH and BiG-SCAPE output data were parsed using a custom R script (https://github.com/AU-ENVS-Bioinformatics/RauENVS/blob/main/R/parse_antismash.R).One BGC classified as Saccharide was moved to the category "Other", due to it being the only Saccharide detected.
The similarity network was created using Cytoscape version 3.9.1 (Shannon et al., 2003).Gene cluster comparison figures were constructed using Clinker (Gilchrist and Chooi, 2021).

. The biosynthetic potential of Greenland Ice Sheet genomes
The collection of mined genomes included 133 high-quality MAGs and 28 genomes from bacterial isolates.Most MAGs originated from the cryoconite metagenomes (89), followed by the ice (32) and biofilm (12) metagenomes (Jaarsma et al., 2023).Isolates originated predominantly from cryoconite (13) and ice (10), but with few from the biofilm (3) and the snow (2).It was found that these 28 isolates are a less diverse group of organisms that is less abundant in the environment compared to the MAGs.Furthermore, the isolates and MAGs contain little genetic overlap, despite them originating from the same environment (Jaarsma et al., 2023).
After nanopore sequencing complete genomes were obtained for all except three of the 28 genomes, the exceptions being almost complete but not closed.Only eight genomes could be identified to species level with valid DNA-DNA Hybridization (DDH) and GC values using TYGS.This indicates that the genome collection contains many potentially new species.A full table of the genome collection including environment, closest relatives based on TYGS and the NCBI 16S rRNA database, genome size, GC content, number of genes, plasmids, rRNA operons and tRNA genes can be found in Supplementary material S1.
A total of 848 Biosynthetic Gene Clusters (BGCs) were obtained, 641 from MAGs and 207 from isolate genomes.Only 25 % of BGCs obtained from the MAGs were found to be complete (i.e., not located on a contig edge).For this reason, "glocal" alignment mode was used in BiG-SCAPE, accounting for potential fragmented BGCs.One BGC, encoding a nonribosomal peptide synthetase (NRPS) was found to be located on a plasmid, in an ice surface Herbaspirillum isolate (5I1).Out of the 71 types of BGC that can be recognized by antiSMASH, 33 were obtained in total, with 25 types in the isolate genomes and 23 types in the MAGs (Supplementary material S2).
The distribution of BGC classes, as assigned by BiG-SCAPE, differed between isolate genomes and MAGs, the most notable difference being a much larger abundance of terpenes among the MAG BGCs (42 vs. 7 % of BGCs) (Figure 2).The distribution of MAG BGCs is similar among the different environments, while this distribution is more different for the isolate BGCs.Notably, no terpene BGCs were found among snow BGCs.
On average, 5.3 BGCs were predicted per genome, with a mode of 5.The average was higher for the isolates (7.3) than for the MAGs (4.8).In addition, the median number of predicted BGCs per genome was significantly higher for the isolates (7) compared to the MAGs (5) (Wilcoxon Rank Sum test, p − value < 0.0001) (Figure 3).Organisms with a high number of BGCs are often referred to as "talented".Here, we define a producer as talented when it contains a number of BGCs that is equal to or over twice the median value.Talented producers were found among isolates as well as MAGs.When investigating the number of BGCs per genome for each phylum separately, the highest median was found in Acidobacteria (7), with the entire interquartile range (IQR) being equal or higher to the overall median (Figure 4).The lowest median number of BGCs per genome was found in Bacteroidetes (2), and the entire IQRs of Bacteroidetes, Actinobacteria, Armatimonadetes, and Gemmatimonadetes were lower than the overall median.The  most talented producers were found among the Proteobacteria, with up to 15 BGCs per genome (Supplementary material S3).

. Similarity network analysis of obtained biosynthetic diversity
Using BiG-SCAPE, the Biosynthetic Gene Clusters (BGCs) were organized into 410 Gene Cluster Families (GCFs), the biggest containing 13 BGCs.Only 215 BGCs remained as singletons (Figure 5).While 307 GCFs were unique to MAGs, 98 GCFs were unique to isolate genomes.Only five GCFs were obtained from both MAGs and isolate genomes.Cryoconite genomes harbored the most unique GCFs (217), followed by ice (66), biofilm (57), and snow (8).No GCFs were obtained that included BGCs from all four environments.The biggest overlap was found between ice and cryoconite, with 46 shared GCFs.A Venn diagram of GCFs from the different environments can be found in Supplementary material S4.Genomes from cryoconite harbored the largest proportion of unique GCFs (81 %), followed by genomes from biofilm (75 %).Genomes from snow and ice contained a smaller proportion of unique GCFs, with 57 and 54 %, respectively.It is important to note that these results can be influenced by the number of samples analyzed for each habitat.
It was investigated whether homologous Biosynthetic Gene Clusters have previously been reported by comparing our dataset to the MIBiG repository.Three Gene Cluster Families (GCFs) contained a MIBiG reference BGC (Figure 6).Firstly, a Non-Ribosomal Peptide (NRP) BGC from Pseudomonas protegens PF5, encoding pyoverdine, a well-described siderophore (Stintzi et al., 1999) (BGC0000413), was part of a Gene Cluster Clan (GCC) containing two GCFs (2078 and 2111).This clan, with seven BGCs from Pseudomonas isolates, was the only GCC containing BGCs from all four environments.In addition, two more MIBiG reference BGCs from Pseudomonas fluorescens clustered with two isolate BGCs.Firstly, a BGC encoding obafluorin, which is a β-lactone antibiotic (Schaffer et al., 2017), (BGC0001437) was found in the same GCF (1404) as a BGC from an isolate from cryoconite.Secondly, a pseudomonine BGC (BGC0000410), encoding a siderophore (Mercado-Blanco et al., 2001), was part of a GCF (449) together with a BGC from an ice sample isolate.The latter two isolates are Pseudomonas species, and both also contain a BGC that was part of the aforementioned pyoverdine GCF.
A presence/absence plot was made of the 20 most abundant GCFs, also showing the average nucleotide identity (ANI) clustering of the 54 genomes that harbor these 20 GCFs (Figure 7).This group of genomes contained six talented producers that contained 10 or more BGCs.Where possible, similar BGCs, as predicted by antiSMASH, are noted for each GCF.Three GCFs obtained only from MAGs were related to carotenoids, with one BGC in GCF 2677 having 66 % similarity to a zeaxanthin BGC from Xanthobacter autotrophicus Py2 (BGC0000656).The similarity of the five identified BGCs in GCF 2677 was limited to one gene, crtB, which encodes a phytoene synthase, which makes a colorless precursor of the carotenoid zeaxanthin (Larsen et al., 2002).In addition, the identified BGCs contained up to five other genes, potentially encoding the full biosynthesis of a carotenoid pigment (Supplementary material S5).
In accordance with the distribution of the entire collection of GCFs, there was little overlap between the 20 most abundant GCFs from MAGs and isolate genomes.One carotenoid GCF (28) was, however, observed in MAGs and isolate genomes, as well as two GCFs with unknown function (2013 and 54).The latter GCF 54 contained BGCs from genomes in separate clusters in the ANI tree.

Discussion
In this study, an extensive set of predicted Biosynthetic Gene Clusters (BGCs) was obtained from a collection of metagenomes and cultured isolate genomes obtained from supraglacial habitats on the Greenland Ice Sheet.The distribution of these BGCs revealed that isolates had a significantly higher median of BGCs per genome compared to the MAGs.Furthermore, few Gene Cluster Families were shared between multiple sampled environments.We found evidence for the capacity to produce natural products that might aid in adaptation to life in the cryosphere.The majority of BGCs were however of thus far unknown function, highlighting the large potential of supraglacial bacterial genomes for bioprospecting.

. Distribution of BGCs within habitats and genomes
A significantly higher median number of BGCs per genome was found for isolates compared to MAGs.Acidobacteria had the highest median number of BGCs per genome, but the Proteobacteria yielded the most talented producers.On average, more BGCs were observed per genome for both the MAGs (4.8 BGCs/genome) and isolates (7.3 BGCs/genome) in our collection than was observed in a large genome mining study of over 170,000 publicly available isolates (6.2 BGCs/genome) and over 47,000 MAGs (2.7 BGCs/genome) (Gavriilidou et al., 2022).
Very little overlap in biosynthetic potential was observed between isolates and MAGs: only five out of 410 identified Gene Cluster Families (GCFs) were shared between the two groups.This is in line with the observation that the isolates and MAGs contain little genetic overlap.Similarly, little overlap was found in the biosynthetic potential coming from the different environments.No GCFs were shared between all four environments, although this could be explained by sample size differences.In our study, it is challenging to ensure equal representation of various habitats due to differences in sample types.Efforts were made to maximize biodiversity captured in ice surface and cryoconite habitats by combining multiple subsamples for deep metagenomic sequencing.However, this approach was not feasible for the more unique snow and biofilm samples.While the biofilm was used for metagenome sequencing, the snow habitat remains under-sampled as only isolates were included.Each environment had unique GCFs, possibly encoded by specialist organisms from those specific environments.Conversely, there were also some overlapping GCFs, especially between ice and cryoconite.Microbes living in distinct environments may have gained the same functionalities; for instance, there are GCFs present among different genera (Figure 7).This could reflect horizontal gene transfer, a common way of transfer of biosynthetic gene clusters (Fischbach et al., 2008).Alternatively, it could be that the same organism is simply found in multiple environments.
In only three GCFs did BGCs (exclusively from isolates) cluster together with a MIBiG reference BGC.The minimal similarity with MIBiG BGCs is not surprising, as this database is mainly based on cultured isolates.Even though the collection in this study contained a number of isolate genomes, they are likely understudied representatives.A similar observation was made in the analysis of MAGs from activated sludge microorganisms, where none of the predicted BGCs clustered into GCFs with MIBiG BGCs (Sánchez-Navarro et al., 2022).Furthermore, it has been estimated that only 3 % of the potential encoded bacterial encoded natural products have been characterized experimentally (Gavriilidou et al., 2022).Many novel BGCs were also discovered in 3,241 genomes, including MAGs and isolates, from Tibetan glaciers (Liu et al., 2022).Similarly to the Tibetan glaciers, terpenes were the most abundant BGCs and most BGCs were obtained from Proteobacteria from the Greenland Ice Sheet.However, while Myxococcota contained the highest number of BGCs per genome within the Tibetan glaciers, they were not present in our genome collection.

. Biosynthetic potential reflects environmental adaptation
While many of the predicted BGCs were of unknown function, those that could be linked to similar BGCs with a known function included BGCs encoding antimicrobials, siderophores, compatible solutes, and carotenoid pigments.It has previously been found that volatile organic compounds emitted from bare ice surfaces on the Greenland Ice Sheet include compounds with a reported antifungal activity (Doting et al., 2022).In addition, the pigment produced by glacier ice algae, purpurogallin carboxylic acid-6-Oβ-D-glucopyranoside, has been suggested to have an antimicrobial activity next to its photoprotective activity (Remias et al., 2012).Our evidence for the capability to produce antimicrobial natural products adds to the indications for a potential role of natural products in microbial interactions in supraglacial habitats of the Greenland Ice Sheet.Furthermore, we found BGCs that might be involved in adaptation for life in the cryosphere.Carotenoid pigments offer protection against harmful UV radiation (Krinsky, 1978), and it is therefore not surprising that glacial microorganisms use pigments as sunscreen to protect against the harsh sunlight that is often found in high latitudes and altitudes.Carotenoid pigments are also found to regulate membrane fluidity in response to temperature fluctuations (De Maayer et al., 2014).Carotenoid pigments have for instance been identified in bacteria from Antarctica (Dieser et al., 2010;Vila et al., 2019).Carotenoid pigments are also present in the pigment pool of supraglacial communities in Greenland (Halbach et al., 2022), and in the metabolome of the Foxfonna ice cap in Svalbard (Gokul et al., 2023).Liu et al. (2022) have furthermore speculated that the large number of terpene BGCs in Tibetan glacier genomes may be explained by the presence of carotenoid pigments in bacteria.
It has been suggested that pigments in eukaryotic glacier algae not only help shield against UV radiation, but may also play a role in the production of liquid water by accelerating the melt of the local frozen environment (Dial et al., 2018).If the same mechanism also occurs in pigmented bacterial cells is yet to be tested, but in theory it would increase the cell's fitness if it was able to ensure a film of liquid water around itself, especially for those living on the ice surface.It has been speculated that surface microbes other than eukaryotic algae could also play a role in biological albedo reduction (Hotaling et al., 2021).It is therefore worthwhile investigating the extent of bacterial pigmentation on the ice surface further.In addition, evidence for the biosynthesis capacity of the compatible solute NAGGN was found among the ubiquitous GCFs.The accumulation of compatible solutes like NAGGN is linked to protection against freezing and osmotic stress (D'Souza-Ault et al., 1993;De Maayer et al., 2014).Another stressor that is associated with cold environments is low nutrient concentrations (De Maayer et al., 2014), and this may be reflected in the biosynthetic potential of supraglacial bacteria.While it has not yet been truly tested whether the concentration of iron on the surface of the Greenland Ice Sheet is limiting microbial growth, the presence of BGCs encoding siderophores suggests a role for these natural products in scavenging iron as a potential environmental adaptation.
It seems, therefore, that Greenland Ice Sheet microbes are equipped with biosynthetic potential to help survive the extreme conditions of their habitat, either through physical adaptations or through inhibition of the growth of their competitors using antimicrobial compounds.However, the presence of the biosynthetic capacity to produce above mentioned natural products does not necessarily mean that these natural products are actually being produced in supraglacial habitats.The ecological function of many BGCs therefore remains speculative.For instance, achieving sufficient concentrations for extracellularly-active antimicrobial natural products to function is challenging, especially on the constantly diluted melting ice surface.According to the Screening Hypothesis, evolution favors organisms with a large capacity to produce a wide chemical diversity.Rather than the selective pressure acting on individual BGCs, the pressure therefore acts on the biosynthetic capacity itself, and as a consequence, not every natural product will have a biological function (Firn and Jones, 2003).A transcriptomics approach could be used to investigate whether the BGCs identified here are actually transcribed in situ on the ice sheet.

. Implications for bioprospecting
The majority of BGCs identified in this study, including those in several of the most ubiquitous GCFs, were of unknown function, encoding potentially unknown chemistry.This unknown chemistry may include novel antimicrobials and other biotechnologically relevant compounds.This illustrates the high potential for bioprospecting of these under-explored organisms, but also the large need for functional characterization of many of these BGCs, including screening for activity under lab conditions.
Metagenome-assembled genomes and isolates offer distinct benefits.Isolates, although more easily cultivable, are often inadequately represented in the environment.Conversely, MAGs exhibit stronger representation, yet they frequently remain uncultured.Isolates had a significantly higher median of BGCs per genome compared to MAGs.About 25 % of GCFs in this study were found in cultured isolates, allowing relatively more straightforward characterization of their encoded natural products.The remainder could be explored through heterologous expression, for instance in a cold-adapted expression hosts such as Aliivibrio wodanis (Söderberg et al., 2019), Shewanella livingstonensis (Kawai et al., 2019), or the yeast Debaromyces macquariensis (Wanarska et al., 2022).Many BGCs from MAGs were found to be on contig edges, despite the MAGs being of high quality.It has previously been found that high-quality MAGs assembled from long reads yield more complete BGCs compared to those assembled from short reads (Sánchez-Navarro et al., 2022), highlighting the benefits of long read-assembled high-quality MAGs for future genome mining studies.
In accordance with the low genetic overlap between the isolates and MAGs, there is also little overlap in biosynthetic potential.By harnessing both metagenomic and cultured sources, the prospects of discovering novel compounds are enhanced by tapping into complementary genetic and chemical diversity.
In conclusion, this study identified a diverse set of biosynthetic gene clusters (BGCs) in genomes from the Greenland Ice Sheet, revealing potential survival strategies and substantial untapped resources for bioprospecting in this extreme environment.

FIGURE
FIGURE(A) Fieldwork site on the southwest margin of the Greenland Ice Sheet.Map layers were created using ©Esri, Maxar, Earthstar Geographics, and the GIS User Community.EPSG:.The map shows the average albedo between --and --generated using a m harmonized satellite albedo(Feng et al.,  ).(B-D) examples of supraglacial habitats sampled in this study; cryoconite [(B), scale bar = cm], biofilm in cryoconite hole [(C), scale bar = cm], dark ice surface [(D), scale bar = approx.m].

FIGURE
FIGURE Notched box plot showing the number of BGCs per genome, colored by environment.Number of genomes is given in parentheses.

FIGURE
FIGUREBox plot showing the number of BGCs per genome, separated by phylum.Number of genomes is given in parentheses for each phylum.

FIGURE
FIGURE Similarity network of obtained BGCs.Each point represents a BGC, colored by environment.Shapes indicate if the BGC was found in a MAG or isolate genome.Matches to the MIBiG database are shown as black squares.

FIGURE
FIGUREPairwise alignment of genes (arrows) from isolate Biosynthetic Gene Clusters (BGCs) with genes from three homologous BGCs from the MIBiG repository (names in black).

FIGURE
FIGURE Presence/absence plot of the most abundant GCFs, with average nucleotide identity (ANI) clustering of genomes of their producers.The dashed line indicates an ANI of %.Producers with a name in teal are considered talented with or more BGCs encoded.Colored polygons indicate sample origin of the genomes.Where possible, names of known BGCs with similarity are given for each GCF.Pyoverdine, shown in blue, is a MiBIG reference BGC that was placed in a GCF with isolate genomes by BiGSCAPE (see also Figure ).