Utilization of—Omic technologies in cold climate hydrocarbon bioremediation: a text-mining approach

Hydrocarbon spills in cold climates are a prominent and enduring form of anthropogenic contamination. Bioremediation is one of a suite of remediation tools that has emerged as a cost-effective strategy for transforming these contaminants in soil, ideally into less harmful products. However, little is understood about the molecular mechanisms driving these complex, microbially mediated processes. The emergence of −omic technologies has led to a revolution within the sphere of environmental microbiology allowing for the identification and study of so called ‘unculturable’ organisms. In the last decade, −omic technologies have emerged as a powerful tool in filling this gap in our knowledge on the interactions between these organisms and their environment in vivo. Here, we utilize the text mining software Vosviewer to process meta-data and visualize key trends relating to cold climate bioremediation projects. The results of text mining of the literature revealed a shift over time from optimizing bioremediation experiments on the macro/community level to, in more recent years focusing on individual organisms of interest, interactions within the microbiome and the investigation of novel metabolic degradation pathways. This shift in research focus was made possible in large part by the rise of omics studies allowing research to focus not only what organisms/metabolic pathways are present but those which are functional. However, all is not harmonious, as the development of downstream analytical methods and associated processing tools have outpaced sample preparation methods, especially when dealing with the unique challenges posed when analyzing soil-based samples.


Introduction
Petroleum Hydrocarbon (HC) contamination is a common, extensive and pervasive anthropogenic contaminant in the Arctic and Antarctic (Whyte et al., 1999;Errington et al., 2018). Incidence of HC spills in these environments is correlated with increased human activities such as tourism, scientific exploration and exploitation of natural resources, often resulting from accidental spills or past mismanagement of waste (Mohn et al., 2001;Bennett et al., 2015;Camenzuli and Freidman, 2015). In Antarctica, terrestrial spills are often localized to permanent Antarctic bases and range from small localized spills at re-fueling sites through to larger spill incidents from storage tanks which can affect hundreds of square meters up to a few square kilometers of soil (Tin et al., 2009;Jesus et al., 2015). In the Arctic an estimated 42% of structures built upon permafrost are at risk due to permafrost thawing (Ramage et al., 2021), the implications of this within the context of hydrocarbon spills can be seen in events such as the 2020 collapse of a diesel storage tank due to melting permafrost, which released 21,000 tonnes of diesel into surrounding waterways including the Ambarnaya river (Glanville et al., 2020). Given the enormity of legacy and likely future spills, particularly in the Arctic with permafrost vulnerable infrastructure (Glanville et al., 2020;Ramage et al., 2021), the requirement to mitigate environmental damage through remediation and clean-up is vital.
Unlike in more temperate climates where natural attenuation of hydrocarbon spills is more rapid, terrestrial hydrocarbon spills in cold climates can persist in the environment for decades (Revill et al., 2007). While the toxicity of hydrocarbon pollutants has been shown to decline with age, ecotoxicology tests have demonstrated that contaminants from Antarctic diesel can impact the health of invertebrates even after extended aging (Brown et al., 2016(Brown et al., , 2023. The persistence and toxicity of hydrocarbons as a pollutant in otherwise relatively remote and pristine locations makes the remediation of these contaminants a matter of import, for continued social licence to conduct scientific and other endeavors in these unique environments. The most common and effective method applied in cold regions utilizes endemic microorganisms in a process known as bioremediation (Whyte et al., 1999;Mohn et al., 2001;Josie et al., 2014;Camenzuli and Freidman, 2015;van Dorst et al., 2021). Bioremediation projects vary in sophistication and labor costs on a spectrum, from natural attenuation (effectively 'sit back' and monitor), to in situ treatment, or excavation and treatment of the soil in landfarms or engineered biopiles, in which heat, water availability, nutrient content and microbial composition can be monitored and controlled (Ruberto et al., 2003;Bento et al., 2005;Kauppi et al., 2011;Gutiérrez et al., 2020;Johnsen et al., 2021;van Dorst et al., 2021).
On site bioremediation can have lower economic and environmental costs compared to offsite disposal or treatment. This has made bioremediation an attractive option for the clean-up of hydrocarbon spills in cold climates (Ruberto et al., 2003;Martínez Álvarez et al., 2015;Errington et al., 2018). However, cold desert climates such as in the Arctic and Antarctic also pose unique challenges in the bioremediation of hydrocarbons. The extreme low temperatures and short summers reduce the volatization and bioavailability of hydrocarbons in the soil (Pawar, 2012;Gomez and Sartaj, 2013;Cipullo et al., 2019). In addition to cold temperatures, essential resources such as nitrogen, phosphorus and water are scarce, all resulting in a comparatively low microbial load and activity. Relative to temperate environments and controlled laboratory conditions, natural attenuation rates are negligible under these harsh conditions (Pawar, 2012;Gomez and Sartaj, 2013;Josie et al., 2014;Cipullo et al., 2019).
Omic technologies have emerged as powerful tools in untangling how native microbes respond to these unique challenges, as well as how they respond to rapid changes brought about by bioremediation processes themselves (Gupta et al., 2020;Josie et al., 2014;Ramírez-Fernández et al., 2021;Zhang et al., 2022). Antarctic soils are dominated by microbial communities, making up the majority of local genetic diversity and driving major geochemical cycles van Dorst et al., 2021). Technologies such as 16S community profiling, metagenomics, proteomics and transcriptomics enable researchers to monitor the composition and function of microbial communities both in their natural state as well as once impacted by change (Eckford et al., 2002;Han, 2013;Magalhães et al., 2014;Simas et al., 2015). Given the objective of bioremediation projects is to reduce environmental harm, by removing contaminants like hydrocarbons (HC) from the soil without causing further damage and disruption, understanding the metabolic pathways responsible for HC degradation as well as other key nutrient cycling pathways is crucial in developing effective bioremediation strategies (Josie et al., 2014;Gupta et al., 2020;Ramírez-Fernández et al., 2021;Zhang E. et al., 2021).

Methods
The corpus used for this study was generated with Scopus search using the keywords 'bioremediation AND (Antarctica OR ARCTIC OR COLD) AND hydrocarbons' which yielded 104 research articles. After omitting articles related to heavy metals and other xenobiotics, 71 articles were selected. Three additional searches were conducted in which the term Antarctica was replaced with 'genomic' , 'proteomic' or 'transcriptomic' . The search period was restricted to 2002 and 2022. The articles were filtered for relevance with articles focusing on heavy metals and other xenobiotics omitted as well as review articles. In total, 117 articles were added to a Mendeley library before importing to VOSviewer version 1.6.17 (van Eck and Waltman, 2011).

Text mining
Academic papers are being produced at an unprecedented rate, with annually published research articles increasing exponentially (Fire and Guestrin, 2019). Currently, most of this information is in the form of written paragraphs otherwise known as unstructured data (Bajocco et al., 2019). Unstructured data while convenient for the reader, is often incompatible with current statistical and analytical methods, making unbiased and accurate assessment of trends within the literature difficult (Rao, 2003). The sheer volume of available information limits the capacity of human researchers to sift through the literature in a timely and accurate manner. To solve this emerging issue a variety of software capable of automatic categorization and information extraction of text data have emerged (Ganino et al., 2018;S. H. H. Shah et al., 2020). These processes commonly result in a form of text analysis known as text mining, with the text broken up or 'tokenised' to uncover the prevalence of key terms or phrases (Bajocco et al., 2019;Parkavi et al., 2020;Wong et al., 2021). Here, we aim to utilize text mining to uncover trends in bioremediation research, particularly related to the utilization of 'omic' technologies within this field. Figure 1 displays a shift in bioremediation research from 2002 to 2022, during this time the most popular lines of inquiry shifted from a broader macroscopic level approach represented by blue and green network links to highly specific molecular scale approaches represented in yellow. Demonstrated by the shift in research focus from bioremediation and biostimulation in field and microcosm experiments, dealing with temperature, water and nutrient Frontiers in Microbiology 03 frontiersin.org amendment, to sophisticated 'omic' studies investigating novel metabolic pathways and functional genes associated with biodegradation (Delille et al., 2004;Bento et al., 2005;Dias et al., 2015;Kai et al., 2020;Wang et al., 2021). The most prominent terms with an average publication year between 2010 and 2012 were biostimulation and bioaugmentation. In contrast, 2013-2018 focused on microbial community composition, and the function and production of metabolites of interest, such as biosurfactants. Of the commonly mentioned genera, Rhodococcus garnered more attention in recent years with emerging studies using metagenomics to study metabolic pathways. In Figure 1 the literature visualization tool VosViewer was utilized to provide an overview of the trends within bioremediation research over the past 20 years. There are multiple benefits to text mining approaches, the first being one of time saving time, as it provides the user with the ability to generate a visual summary of hundreds of publications instantly. The side by side comparison of decades of research may provide the stimulus for revisiting techniques that have fallen out of focus, such as bioaugmentation, in an attempt to explain the molecular mechanisms causing introduced bacteria to have limited success in field scale bioaugmentation studies (Bento et al., 2005;Stallwood et al., 2005;Ruberto et al., , 2010Kauppi et al., 2011;Watahiki et al., 2019).

Omic tools in bioremediation research
Prior to the emergence of 'omic' approaches the study of microbial diversity and function was limited to those organisms that were culturable (Aislabie et al., 1998;Jamal and Penninckx, 1999;Whyte et al., 1999). Consequently, a large portion of the microbial population in any given sample was ignored (Paul, 2022). Culture independent technologies such as amplicon sequencing, metagenomics, transcriptomics and proteomics have become ubiquitous in modern microbiology for their ability to produce accurate, high throughput data on microbial taxonomy, function and phylogeny (Gupta et al., 2020;Paul, 2022). The presence of 'omic' tools within the sphere of bioremediation in all climates is summarized in Table 1. While the bulk of the studies incorporating 'omics' were not focused on cold climates specifically, the summary demonstrates the potential impact of these technologies in assessing the function and composition of the local microbiome as a means of assessing soil health. Of the above mentioned 'omic' technologies, 16S rRNA amplicon sequencing has cemented its place as the workhorse of microbiology research (Evans et al., 2004;Païssé et al., 2010;Kauppi et al., 2011;Zhang and Lo, 2015;Koshlaf et al., 2019;van Dorst et al., 2021).
16S amplicon sequencing involves the amplification and parallel sequencing of the highly conserved 16S small sub-unit ribosomal RNA gene to differentiate organisms based on taxonomy (von Wintzingerode et al., 2002). This technology is commonly leveraged within bioremediation studies to monitor the diversity of the microbial communities as well as the shifts in community structure that occur as bioremediation progresses (Païssé et al., 2010;Ruberto et al., 2010;Kauppi et al., 2011;van Dorst et al., 2020van Dorst et al., , 2021. The relative low costs and well established downstream bioinformatic processes associated with 16S amplicon sequencing coupled with its utility lend to its current ubiquity within environmental and medical microbiology fields (Baraniecki et al., 2002;Wong et al., 2021). This technology is limited in that it does not provide direct insight into organism function, other than what is already known about the microbes that are identified (Gupta et al., 2020). Statistical approaches, such as linear discriminant analysis (LDA) effect size (LEfSe) can fill this gap by enabling researchers to assign specific taxa as biomarkers for desirable conditions within a bioremediation project (Xue et al., 2020;Sui et al., 2023). One potential use for this form of analysis is in testing potential nutrient amendments for desirable bacterial responses while only comparing 16S data to various experimental conditions (Xue et al., 2020;Sui et al., 2023). Crocker et al. (2019) utilized LefSe analysis of 16S data obtained from sub-Arctic soils to identify the organisms that had the greatest response to chitin supplementation within the context of degrading hexahydro-1,3,5-trinitro-1,3,5-triazine and 2,4-dinitrotoluene. The findings indicated a rapid increase in hydrocarbon degradation by bacteria and fungi in the families Cellulomonadaceae and Mortierellaceae when supplemented with biochar (Crocker et al., 2019).
Furthermore, as microbial genomes become better annotated and functional genes within taxa identified, analytical methods, such as functional annotation of prokaryotic taxa (FAPROTAX) for estimating the functional capacity of a microbiome have been developed (Sansupa et al., 2021;Hou et al., 2023). In the bioremediation space, it is common to manually track the abundance of known HC degraders in order to make an inference on whether biostimulation protocols are favorable to these organisms (Gesheva et al., 2010;Gutiérrez et al., 2020;Habib et al., 2020;S. Kumar et al., 2020;L. A. M. Ruberto et al., 2005;Tribelli et al., 2018). Tools such as FAPROTAX provide evidence for this practice by assigning functional capabilities to all taxa in the sample. For example, this technology has been applied on 16 s and functional data during electrobioremediation of oil field soils to demonstrate if the diversity and abundance of organisms harboring hydrocarbon degrading genes was greater in biochar supplemented soils (Rushimisha et al., 2023). But, limitations with the application of this analytical method stem from the FAPROTAX database being created from cultured representatives for marine and freshwater samples (Jung et al., 2021). In this case, metabolic functions attributed to the aquatic cultured bacteria used to build the FAPROTAX database may not have be represented by species of the same taxa present in a hydrocarbon contaminated soil. Additionally, the database is built entirely from cultured organisms and as a result it cannot be used to identify functional genes residing within the so called 'microbial dark matter' (Sansupa et al., 2021). The creating of additional databases that Frontiers in Microbiology 05 frontiersin.org not only represent cultured aquatic samples but also cultured and uncultured annotated genomes of soil micro-organisms would go a long way in improving this technology (Sansupa et al., 2021;Leite et al., 2022). Amplicon sequencing has been applied both at the field and laboratory scale in bioremediation processes, with prior taxonomic screening of a community a critical first step toward developing sitespecific bioremediation strategies (Ruberto et al., 2003;van Dorst et al., 2020). It was through 16S sequencing approaches that a significant loss in microbial diversity was observed in both the initial hydrocarbon contamination stage as well as with many forms of biostimulation (Evans et al., 2004;Silva-Castro et al., 2013;van Dorst et al., 2020), with the notable exception of biostimulation strategies that rely heavily on bulking agents such as pea-straw (Josie et al., 2014;Koshlaf et al., 2019). As demonstrated by van Dorst et al. (2014), the typically taxonomically heterologous Macquarie Island soils experience a significant taxonomic and functional shift when contaminated with 1,000 mg/kg total petroleum hydrocarbons (TPH), characterized by a loss of oligotrophs and ammonium oxidizers and instead heavily favoring known HC degrading microorganisms. These results provided evidence for a predictable site-specific shift in taxonomy and function when soils experience HC toxicity that was then used for monitoring field sites during active bioremediation (van Dorst et al., 2020). Monitoring community structure has played a major role in influencing our understanding of another popular bioremediation approach called bioaugmentation, the introduction of a mixed or pure culture of known hydrocarbon degraders to a contaminated site (Gomez and Sartaj, 2013). Bioaugmentation has produced promising results in lab scale studies, slightly outperforming biostimulation alone (Bento et al., 2005;Gomez and Sartaj, 2013). However, monitoring foreign strains after introduction into a microbial community has shown that in many cases the introduced consortium often fail to persist in field-scale studies, reducing its likelihood of having a significant positive impact on hydrocarbon degradation (Bento et al., 2005;Kauppi et al., 2011;Gomez and Sartaj, 2013), with Watahiki et al. (2019) reporting significant reductions of R. jostii in the population 6 days post-inoculation.
Targeted next generation amplicon sequencing allows for a single gene or panel of genes to be sequenced from a sample, it is the next 'step up' from 16S amplicon sequencing in the context of functional analysis of soil allowing for greater specificity and accuracy. For example by targeting known hydocarbon degrading genes such as alkB in addition to other functional genes of interest (Bewicke-Copley et al., 2019;Sansupa et al., 2021). Targeted amplicon sequencing of functional genes has been used to investigate the relationship between environmental conditions and genes linked with geochemical cycling. Certain genes can act as markers for specific groups of organisms and provide information beyond their primary function, for example by monitoring the abundance of NifH genes in a hydrocarbon contaminated site researchers can determine the degree of toxicity caused by HCs as nitrifiers are known to be particularly susceptible to hydrocarbon toxicity (Pudasaini et al., 2019). While the introduction of bioavailable nitrogen to a soil system whether the source be from penguin guano or biostimulation protocols has been affiliated with an increase in de-nitrifying genes, such as NirS, NirK, and NosZ (Jung et al., 2011;Ramírez-Fernández et al., 2021;van Dorst et al., 2021). However, increased abundance of a functional gene is not always an accurate indicator of metabolites being produced by an organism, for example penguin guano impacted soils are commonly associated with high levels of NO 2 emission (Zhu et al., 2013;Wang et al., 2019). This is despite a high abundance of NosZ genes associated with these soilswhich is responsible for reducing NO 2 to nitrogen gas (Ramírez-Fernández et al., 2021). Some possible explanations for such a discrepancy are the target gene not being expressed or uneven reaction kinetics. While limited in its capacity to predict microbiome function, targeted sequencing of functional genes within the context of bioremediation has provided a rapid and cost-effective method of monitoring abundance of key functional genes, during and after bioremediation strategies have been implemented (Mills et al., 2003;Li and Yan, 2021).
Non-sequencing-based approaches such as quantitative PCR (qPCR) have emerged as a method for quantifying genes of interest, with the benefits of reduced financial and time cost than sequencing techniques (Lasa et al., 2019;Li and Yan, 2021). The trade-off for these conveniences is less discovery power with a smaller array of genes being searched for in a single run (Lasa et al., 2019;Li and Yan, 2021). Metagenomics involves the fragmentation and sequencing the entirety of the genetic material in a sample (Albertsen et al., 2013;Meziti et al., 2021). From this pool of data, metagenome assembled genomes or MAGs can be created. The assemblage of draft genomes from metagenomic data removes the need to isolate an organism in culture before sequencing its genome. Instead, whole genomes can be assembled directly from environmental samples, this reduced reliance on culturing has saved countless hours of labor as well as generating draft genomes of organisms for which are yet to be isolated and exist as microbial dark matter (Albertsen et al., 2013;Meziti et al., 2019Meziti et al., , 2021. The generation of MAGs from metagenomes received some early criticism due to loss of accuracy from potential contamination of reads, mis-binning and potential lack of read depth and coverage (Chen et al., 2020;Waschulin et al., 2022). However, these issues have not outweighed the primary benefit of bypassing the requirement of culturing pure isolates prior to conducting molecular studies on microorganisms, to the extent that generation of MAGs has begun to out-pace isolate derived genomes (Bowers et al., 2017).
Metagenomic techniques are gaining prominence for being the 'omic' one stop shop. Metagenomic data can provide information on taxonomic diversity, functional potential of the microbiome and MAGs, but importantly not information regarding gene expression (Bao et al., 2017;Zhang et al., 2019;Dell' Anno et al., 2021). Within the context of bioremediation, it is rare to find an individual organism capable of degrading a xenobiotic to its terminal point, rather it is far more likely that a composite metabolic pathway is formed between several synergistic microorganisms (Khomenkov et al., 2008;Zhang et al., 2019;Gao et al., 2021). Metagenomic screening of soils and sediments is a popular method for determining the suite of organisms and specific functional genes that contribute to a metabolic pathway (Khomenkov et al., 2008;Zhang et al., 2019;Gao et al., 2021;Baek et al., 2022;Semenova et al., 2022). Gao et al. (2021) applied this technology to the bioremediation of hydrocarbons with the comparison of the metabolic pathways -Nitrogen and Hydrocarbon within a target soil, finding glutamine and glutamate synthase to be a key enzymes in the nitrogen cycles of HC degraders. In this instance, this new information was validated in practice as ammonium was shown to be the most efficient nitrogen supplement. Metagenomic analysis of soils, via prediction of metabolic pathways has the potential to inform researchers and project managers on optimal nutrient Frontiers in Microbiology 06 frontiersin.org amendment. Metagenomic also offers the ability to identify bottlenecks or missing elements to a HC degradation pathway and inferring tentative predictions of metabolites that could be formed during these processes (Khomenkov et al., 2008;Zhang and Lo, 2015;Zhang et al., 2019;Gao et al., 2021;Baek et al., 2022;Ibrar and Yang, 2022). While metagenomics has grown with the promise of circumventing the bottleneck of difficult to culture organisms within microbial studies it comes with its own restrictions. These being predominantly related to limitations the bioinformatic processing of large data sets, which commonly occurs at a slower pace than the generation of sequencing datasets (Ugarte et al., 2019;Hiseni et al., 2021;Shahroodi et al., 2022). Specifically, issues relating to limited reference database coverage, lack of integration and modularity between bioinformatic pipelines and taxonomic look up functions are all current sources of bottlenecks which can reduce accuracy of results, create difficulties upgrading and maintaining pipelines or they may simply be computationally intensive (Ugarte et al., 2019;Hiseni et al., 2021;Shahroodi et al., 2022). It has also been demonstrated that MAGs at a completeness of 95% can miss up to 50% of the variable genes in a population, that is those genes that are present in greater than 10% but less than 95% of the population (Meziti et al., 2021). While likely due to incomplete or incorrect binning of contigs, the resulting lack of sensitivity to variable genes could pose a roadblock to detecting novel variants of functional genes responsible for xenobiotic degradation (Nelson et al., 2020;Meziti et al., 2021).
While metagenomics and binning approaches provide deeper and highly specific information on the taxonomy and functional capacity of individual members within a complex microbial community, is accompanied by higher comparative costs, data burden and increased complexity of downstream bioinformatic processes (Meziti et al., 2021). These limitations are consequently restricting the analysis of large metagenomics datasets to organizations with access to high performance computing clusters, yet, even with access to specialized tools, data analysis can take multiple days to a week (Tikariha and Purohit, 2019;Silva et al., 2021;Chivian et al., 2023). Nevertheless, improvements to downstream analysis pipelines and technologies are constantly occurring and with time will likely mitigate these challenges (Meziti et al., 2019(Meziti et al., , 2021Tikariha and Purohit, 2019;Nelson et al., 2020). Bioinformatic environments such as Kbase provide powerful platforms which enable the assembly and analysis of MAGS from initial reads as well as genome analysis such as metabolic modeling and genome annotation, while these technologies are available elsewhere the shift toward centralized tools for metagenome analysis will increase accessibility, data sharing and support which are key factors in reducing limitations to this technology (Nelson et al., 2020;Chivian et al., 2023).
Transcriptomics utilizes RNA fragments that correspond to DNA from gene coding regions, such as messenger RNA, ribosomal RNA and transfer RNA through technologies such as RNA-seq, microarray and real time PCR (Lamas et al., 2019). Within the context of bioremediation, transcriptomics is used to determine the array of functional genes being expressed at a given time (Stallwood et al., 2005;Fida et al., 2017;Tribelli et al., 2018). For example, Novosphinobium in a bioaugmentation system was observed to enter a viable non-cultivatable state in which its population did not increase (Fida et al., 2017). However, Novosphinobium continued to up-regulate genes known to be associated with hydrocarbon degradation suggesting that significant shifts in microbiome function can occur during bioremediation that cannot be identified via taxonomic trends alone. To date, bioremediation studies leveraging transcriptomics are limited, and have focused on xenobiotics other than hydrocarbons, as it is a powerful tool in unraveling novel metabolic pathways (Das et al., 2020;Baek et al., 2022). Within the field of hydrocarbon bioremediation, transcriptomics has been of particular use in identifying genes associated with tolerance to Polycyclic Aromatic Hydrocarbons (PAHs) (Ito et al., 2022;Su et al., 2022), as well as monitoring dynamic changes in arrays of genes associated with hydrocarbon degradation (Das et al., 2020;Wang et al., 2021).
While not as abundant within bioremediation literature as metagenomics, the use of transcriptomics is growing within bioremediation studies. Transcriptomics as a method of investigating microbial response to differing environments commonly reports differentially expressed genes in the thousands to tens of thousands (Fida et al., 2017;Tribelli et al., 2018;Das et al., 2020), which is significantly more data than other technologies such as metaproteomics used for this purpose (Zhao and Poh, 2008;Zhang et al., 2022). Transcriptomics has been used to uncover metabolic pathways and novel gene expression shifts in an environment independent of taxonomic shifts. For example, such as during exposure of Pseudomonas aeruginosa to crude oil, where interestingly some HC degrading genes were shown to be downregulated like xylL, xylX, and antA in the presence of crude oil (Das et al., 2020). When considering the sheer diversity of potential and known HC degrading genes that were differentially expressed it suggests P. aeruginosa is capable of several pathways of HC degradation and is capable of tailoring gene expression to the form of HC in its environment (Das et al., 2020). This discovery has implications for wider studies focused on bioremediation, for example transcriptomic analysis of potential HC degraders could be useful in organism selection regarding bioaugmentation studies.
Transcriptomics also finds practical use in uncovering natural and artificial factors they may lead to suppression of key HC degrading genes, for example early findings in marine samples show greater upregulation of the alkB gene in hydrocarbon degrading bacteria supplemented with biosurfactants than with those supplemented with Ultrasperse 2 (De Couto et al., 2016). Further investigation into this area could be beneficial as utilization of chemical surfactants has not been sufficiently explored in relation to their potential toxicity and how that may affect expression of hydrocarbon degrading genes (De Couto et al., 2016).
RNA is characterized by low stability in in-vitro studies, the lability of RNA is increased in the context of cold environments as many psychrophilic RNAase's can remain active at low temperatures (Cristescu, 2019;Gunjal Aparna et al., 2021;Hualpa-Cutipa et al., 2022). The other issue facing RNA analysis is the co-extraction of contaminants that can interfere with downstream enzyme applications (Tveit et al., 2014). Common solutions to these complications are centered around amplification through PCR (Rio, 2014;Tveit et al., 2014;Cristescu, 2019). It has been demonstrated that contaminants can be essentially eliminated through dilution followed by linear amplification of RNA template (Tveit et al., 2014). In contexts where the lability of RNA is a concern, such as when transporting samples back from remote locations, reverse transcription of RNA sequences to DNA can enhance the stability of samples without losing read depth (Rio, 2014 Meta-proteomics involves the quantification of all intra and extracellular proteins in an environmental sample most commonly via high performance liquid chromatography combined with mass spectrometry (Wilmes and Bond, 2006;Manuel, 2022). In addition to quantifying structural and intra-cellular proteins, proteomics provides a more accurate view of community function than transcriptomics due to post transcriptional changes that can occur before protein synthesis (Schenk et al., 2019). Meta-proteomic analysis can benefit current understanding of bioremediation projects primarily in three ways; elucidating the effect specific biostimulants might have on enzyme activity, screening for the presence of known hydrocarbon degrading enzymes in soil and uncovering the protein expression of a known isolate (Macchi et al., 2021;Méndez García and García de Llasera, 2021;Baek et al., 2022;Zhang et al., 2022).
Certain bioremediation treatments such as the addition of chemical surfactants or biosurfactants can affect protein function independently of gene expression. For example, full proteomic analysis of B. subtilis revealed a negative correlation between artificial surfactants and function of proteins associated with alkane degradation, whereas bio-surfactants did not negatively impact protein function (Zhang et al., 2022). Proteomic analysis has been the predominant technology used for identifying enzymes involved in PAH degradation (Méndez García and García de Llasera, 2021). PAHs can be more persistent in the environment and often require a more complex metabolic pathway to reach terminal oxidation (Gomez and Sartaj, 2013;Cipullo et al., 2019). In a study investigating the biodegradation of toluene in a bioelectric well, meta-proteomic analysis was used to investigate protein expression in the bulk of the reactor in addition to the anode. The study revealed the abundance of proteins related to hydrocarbon degradation were skewed toward the bulk phase of the reactor, with proteins involved with the tricarboxylic acid cycle featuring prominently on the biofilm formed on the anode surface (Tucci et al., 2022). It is worth noting that the proteins identified from the anode were low, with 46 and 67 proteins identified from each trial, this is a common shortcoming of metaproteomic analysis with no means of protein amplification low protein yield from difficult samples, can significantly impact the results (Tribelli et al., 2018;Tucci et al., 2022). Nonetheless, when combined with other omic technologies such as shotgun sequencing, it is likely that novel metabolic pathways can be predicted. Recently, Tucci et al. (2022) proposed a three step electrogenic degradation pathway that involves initial digestion of toluene by anaerobic hydrocarbon degraders, the resulting unknown intermediaries are then fermented before terminal oxidation by Geobacter sp. residing on the anode.
One pathway to improving the degradation of PAHs is by identifying hydrocarbon degraders via in silico analysis which can then be confirmed via shotgun proteomics (Macchi et al., 2021). Furthermore, a synthetic microbial consortium designed in this way was shown to be effective at degrading phenanthrene in liquid culture, but it is unclear if the same results could be achieved in contaminated soils as past bioaugmentation studies have observed a lack of persistence of inoculate in situ (Bento et al., 2005;Gutiérrez et al., 2020;Macchi et al., 2021). Currently these studies select consortiums based on individual members ability to degrade a particular hydrocarbon (Baek et al., 2022). However, as understanding of microbial interactions increases, it is likely that some microbes will be added to these consortiums with the desired effect of supporting overall consortium health thereby indirectly increasing the efficiency of hydrocarbon degraders.
The bottleneck for 'omic' studies is often the preparation of complex environmental samples for downstream analysis. Soil DNA extraction followed by 16S amplicon sequencing or metagenomic analysis has been widely utilized, with proven sample preparation methods available that have been employed in bioremediation projects (Ferrari et al., 2016;Koshlaf et al., 2019;Gupta et al., 2020;Gao et al., 2021;van Dorst et al., 2021). However, when it comes to the study of both tRNA and proteins, many of the established methodologies have focused on analyzing concentrated cell samples comprised of a cultivated isolate only, often making them less suitable for direct application to environmental samples (Tribelli et al., 2018;Mukherjee et al., 2022). The afore mentioned trend is also present in metaproteomic studies, with in-vitro meta-proteomic samples, originating from a pure culture being more widespread than samples of ex-vitro origins (Zhao and Poh, 2008;Schenk et al., 2019;Kumar et al., 2020;Macchi et al., 2021;Zhang E. et al., 2021). This imbalance has resulted in current methods of analyzing protein samples outstripping sample preparation methods (Zhao and Poh, 2008;Chourey et al., 2010;Schenk et al., 2019;Das et al., 2020;Macchi et al., 2021;Zhang et al., 2022).
The application of meta-proteomics to bioremediation within cold climates faces unique challenges. Low biological yields necessitate the concentration of samples, while substances, known to interfere with Gas Chromatography/Mass Spectroscopy, such as salts, DNA and humic acids are co-extracted along with proteins (Chourey et al., 2010;Tschitschko et al., 2016;Abiraami et al., 2020). These factors in combination can limit discovery power in soil metaproteomic studies (Tucci et al., 2022). It should be noted that this limitation is less prominent in aquatic environmental samples as well as in vitro studies (Tschitschko et al., 2016;Macchi et al., 2021), likely due to easier access to greater sample concentrations and reductions in detritus prior to any sample processing being applied (Chourey et al., 2010;Abiraami et al., 2020). The development of multiplexing protocols such as labeling peptides with isobaric tags has emerged as a potential solution to the issue of protein identification in complex samples and offers greater quantitative accuracy and discoverability (Creskey et al., 2022). The primary limitation to this technique is the associated high costs and use of volatile compounds such as acetonitrile, which need to be handled with care as degradation or inaccurate transfer can lead to inaccurate sample quantitation (Creskey et al., 2022). Currently, Tandem Mass Tag (TMT) labeling has been successfully utilized to provide a detailed insights into the molecular mechanisms involved in PAH degradation in both bacteria and yeasts Zhang L. et al., 2021), although it should be noted that these studies analyzed samples grown in culture and the potential for multi-plexed labeled techniques to solve the challenges posed to soil-based samples is yet to be investigated.
The optimization of sample preparation techniques for soil are required, specifically when dealing molecules such as proteins, current sample cleaning methods result in protein loss and are thus antagonistic toward the need to extract more protein, whereas by extracting more protein a greater amount of contaminants are co-extracted requiring more sample cleaning.
Frontiers in Microbiology 08 frontiersin.org 5. Insights from omic technologies in the polar environment, molecular mechanisms driving hydrocarbon biodegradation Table 2 summarizes some of the key taxa and their HC degrading potential that 'Omic' research has unveiled. When applied within the context of psychrophillic, psychrotrophic and industrially exploitable organisms in cold climate, hydrocarbon contaminated soils, and their products (Ruberto et al., 2005;Parrilli et al., 2010;Kumar et al., 2020).
The utilization of 'omic' technologies, especially regarding upstream sample preparation, is tailored toward ecological or industry focused studies. Their popularity can be accredited to their capacity to unravel the molecular mechanisms underpinning a process of interest (Jung et al., 2011;Rio, 2014;Jurelevicius et al., 2021;Méndez García and García de Llasera, 2021;Ramírez-Fernández et al., 2021;van Dorst et al., 2021). In the context of hydrocarbon bioremediation, the primary focus in the use of these technologies has been investigating the mechanisms driving alkane and PAH degradation (Jurelevicius et al., 2012;Tribelli et al., 2018;Kuc et al., 2019). When assessing the capacity of a microbiome to degrade diesel the relative abundance of a gene called alkB which encodes the alkane monooxygenase enzyme is used (Yergeau et al., 2012;Crane et al., 2018;Li et al., 2020;Ling et al., 2023). Alkane monooxygenase is the first enzyme in the pathway responsible for the terminal oxidation of alkanes and has been commonly found in hydrocarbon contaminated soil in both cold and temperate climates (Yergeau et al., 2012;Crane et al., 2018;Li et al., 2020;Ling et al., 2023). Within the context of bioremediation research, an increase in alkB is thought to be associated with the degradation of short chain alkanes, which form a large proportion by mass of Antarctic blend diesels (Yergeau et al., 2012;Kuc et al., 2019;Semenova et al., 2022). Within hydrocarbon contaminated soils it is not uncommon to find an alkB relative abundance of greater than 100 gene copies detected per 100 organisms, likely due to many organisms having multiple copies of this gene (Yergeau et al., 2012). Due to this confounding variable, it may not be suitable to use alkB abundance in a population as the sole indicator for alkane degradation potential in aerobic soils. Instead a more reliable array of molecular indicators can be assembled from the identification of a significant population of known hydrocarbon degraders. For example targeting Rhodococcus and Pseudomonas via amplicon sequencing (Gutiérrez et al., 2020;Kai et al., 2020), who harbor and abundance of alk and acetyl-CoA synthase genes which are indicators of the alkane oxidation pathway reaching its terminus Semenova et al., 2022).
The above mentioned functional genes and enzymes have been frequently demonstrated to correlate with hydrocarbon degradation rate and 'completeness' , therefore it may be viable to screen for their abundance as an indicator of microbiome 'fitness' within the context of degrading hydrocarbons (Yergeau et al., 2012;Crane et al., 2018;Li et al., 2020;Ling et al., 2023). Understanding the degradation pathways of PAHs is of particular importance, analysis of functional genes in both cold and temperate soils report co-metabolism of PAHs by numerous members of the microbiome more commonly than terminal catabolism by a single organism (Muangchinda et al., 2015;Li et al., 2020;Ali et al., 2023;Sui et al., 2023). Greater diversity among the structure and potential toxicity of PAHs leads to greater complexity among the molecular mechanisms underpinning their degradation (Muangchinda et al., 2015;Li et al., 2020;Ali et al., 2023;Sui et al., 2023). In a study investigating the impact of contamination of native soil from a temperate climate with two PAHs, benzene and benzo[a]pyrene (BaP) demonstrated the increased biodegradation of BaP while co-contaminated with benzene compared to when contaminated with BaP alone, inversely the degradation of benzene occurred faster when it was the sole contaminant (Ali et al., 2023). This result is likely due to the composite nature of metabolic pathways for PAH degradation. This phenomenon has also been observed at the level of an individual organisms, such as L. fusiformis when cultured in isolation in petroleum contaminated soil. Here, the upregulation of alkB and acetyl-coA synthase was demonstrated suggesting the capacity for terminal oxidation of alkanes . At the same time the upregulation of enzymes that catalyze the oxidation of aromatic compounds such as cyclohexanone monooxygenase was observed . Interestingly, L. fusiformis did not appear to express cytochrome P450 alkane hydroxylase which is associated with the oxidation of medium chain alkanes, this finding was further evidenced by reduced growth rates when diesel consisting of medium chain alkanes was used as a carbon source . As biostimulation protocols can be associated with a loss of community diversity, there is a case to be made that with lower diversity some steps in the co-metabolic pathway of PAH degradation may also be lost (Evans et al., 2004;Cury et al., 2015;Van Goethem et al., 2020). However, metagenomic analysis of soil communities in both temperate and cold hydrocarbon contaminated soils have reported a high degree of redundancy among PAH degrading genes (Jurelevicius et al., 2012;Muangchinda et al., 2015;Ali et al., 2023;Lv et al., 2023). In cold climates, little is known about the degradation pathways used by anaerobic hydrocarbon degraders despite evidence of their being present in contaminated soils with high organic carbon on King George Island (Sampaio et al., 2017). However, anaerobic degradation of the HCs toulene and hexadecane, as mediated by benzoyl-CoA reducatase was shown to be more effective than aerobic processes in a biostimulated microcosm study using spiked soils from Casey station, Antarctica (Powell et al., 2006). Despite these promising findings, anaerobic HC degradation mechanisms in cold climates is understudied, this gap in the literature is likely due to the utility of biopiles for the degradation of hydrocarbons in these climates; slower contaminant oxidation rates and higher O 2 saturation in many cold climate soils are all factors which highly favor aerobic processes (Delille and Coulon, 2008;Whelan et al., 2015;Martínez Álvarez et al., 2017;van Dorst et al., 2021). In contrast, anaerobic processes are better studied in warmer climates where methanogens have been observed degrading HCs through fermentative processes (Liu et al., 2019;Madison et al., 2023). Enzymes such as naphthyl-2-methylsuccinate synthase, naphthalene carboxylase, alkyl succinate synthase, and benzoyl coenzyme A have been shown to be significantly upregulated in anaerobic HC contaminated soils (Liu et al., 2019;Madison et al., 2023). Many of these enzymes are associated with PAH degradation, offering a potential explanation for observations that these organisms outperform aerobic bacteria in degrading larger alkanes and PAHs (Cason et al., 2019;Liu et al., 2019;Madison et al., 2023).

Conclusion
Text mining is an effective method for generating a visual overview of the current literature but selection, as well as extrapolation of information from keywords requires careful consideration. The emergence of omic technologies have revolutionized microbiology and as they become more widely available, multi-omic studies will enable a more complete picture of the molecular landscape. Significant progress has been made in mapping out co-metabolic pathways associated with the aerobic degradation of hydrocarbons in cold climates and biomarkers such as AlkB and acetyl-CoA have been shown to correlate with terminal oxidation of alkanes. However, anerobic processes in cold climates are still not well understood, although there is evidence that they could be utilized for the preferential degradation of PAHs. Although great strides have been made, several barriers remain before these technologies can be truly effective as a monitoring tool within the context of bioremediation. Three key impediments are the high costs of the more sophisticated metagenomics and transcriptomic approaches, the specialized facilities required for down-stream analysis, and difficulty with upstream processing of soil samples. Moving forward, improvements in sample preparation that address the issues of rapid sample degradation, potentially in the form of more cost effective DNAase, RNAase and proteinase inhibitors, and co-extraction of potential contaminating molecules while reducing sample loss will greatly benefit the utilization of meta-proteomics and meta-transcriptomics within the context of environmental samples. In the future decreasing costs of multi-omic studies will enable a deeper understanding on how microorganisms interact with hydrocarbons, other xenobiotics and their environment.

Author contributions
BF was determined the theme and direction of this article with input from KA. KA generated the figures and tables. BF, DW, and KA wrote the manuscript. All Authors contributed to the article and approved the submitted version.

Funding
This work was supported by the Australian Government research training program (RTP) scholarship awarded to KA and an Australian Research Council Future Fellowship (FT170100341) grant awarded to BF.