Soil Microbiome Structure and Function in Ecopiles Used to Remediate Petroleum-Contaminated Soil

The soil microbiome consists of a vast variety of microorganisms which contribute to essential ecosystem services including nutrient recycling, protecting soil structure, and pathogen suppression. Recalcitrant organic compounds present in soils contaminated with fuel oil can lead to a decrease in functional redundancy within soil microbiomes. Ecopiling is a passive bioremediation technique involving biostimulation of indigenous hydrocarbon degraders, bioaugmentation through inoculation with known petroleum-degrading consortia, and phytoremediation. The current study investigates the assemblage of soil microbial communities and pollutant-degrading potential in soil undergoing the Ecopiling process, through the amplicon marker gene and metagenomics analysis of the contaminated soil. The analysis of key community members including bacteria, fungi, and nematodes revealed a surprisingly diverse microbial community composition within the contaminated soil. The soil bacterial community was found to be dominated by Alphaproteobacteria (60–70%) with the most abundant genera such as Lysobacter, Dietzia, Pseudomonas, and Extensimonas. The fungal community consisted mainly of Ascomycota (50–70% relative abundance). Soil sequencing data allowed the identification of key enzymes involved in the biodegradation of hydrocarbons, providing a novel window into the function of individual bacterial groups in the Ecopile. Although the genus Lysobacter was identified as the most abundant bacterial genus (11–46%) in all of the contaminated soil samples, the metagenomic data were unable to confirm a role for this group in petrochemical degradation. Conversely, genera with relatively low abundance such as Dietzia (0.4–9.0%), Pusillimonas (0.7–2.3%), and Bradyrhizobium (0.8–1.8%) did possess genes involved in aliphatic or aromatic compound degradation.


INTRODUCTION
Soil contamination is an ongoing and increasing issue that is of great concern from societal, economic, and environmental perspectives. Contaminants of the greatest concern are petroleum hydrocarbons, heavy metals, pesticides, herbicides, and chlorinated hydrocarbons (Devatha et al., 2019). Due to worldwide demand, population growth, and extensive use of petroleum products, the petrochemical industry has continued to grow and continues to pollute both aquatic and terrestrial environments. This can happen during production, transport, usage, or disposal of the oil (Ramadass et al., 2018;Baoune et al., 2019). Contamination of soil by petroleum hydrocarbons is especially problematic for a variety of physical and chemical reasons and is detrimental to ecosystems due to their lipophilic, mutagenic, and carcinogenic effects of its constituents. When the soil or sediment is contaminated with oil, liquid transfer and air diffusion within the porous spaces of the soil are hindered. This impacts the physical properties of soil such as permeability and moisture content, in addition to chemical properties such as pH and the availability of nutrients such as nitrates, phosphates, and sodium. These physical and chemical changes result in reduced growth of microorganisms and plant life within the soil (Baoune et al., 2019;Devatha et al., 2019). The physical and chemical changes caused by pollution of a pristine ecosystem occur rapidly, greatly impacting the functionality of ecosystem services in that environment. Some sensitive autochthonous microbes are quickly killed or inhibited by the petroleum hydrocarbons, reducing the robustness of the microbial community (Hou et al., 2018;Xu et al., 2018). In addition to this, plants are directly affected by factors such as contaminant toxicity, access to light, and an inability to acquire nutrients and water blocked by oil contamination. These factors significantly impact the plants' ability to grow. Without a functional microbial community and healthy plant life, the environment is unlikely to support much higher-order lifeforms (Girsowicz et al., 2018;Truskewycz et al., 2019).
Bioremediation can remove large percentages of petroleum hydrocarbons for a relatively low cost and with much less ecological impact than the chemical and physical alternatives. Bioremediation utilizes microorganisms such as bacteria and fungi, often combined with plants, to remove contaminants from the environment (Quintella et al., 2019). The nature and scale of contamination are environment specific, and bioremediation treatments may not be suitable in all cases. As some ecosystems are contaminated by weathered hydrocarbons, which are less bioavailable, it is important to devise functional and efficient microbial consortia for remediation of these sites as microbes are not guaranteed to respond to the fresh and weather hydrocarbons in the same way (Khan et al., 2018;Ramadass et al., 2018).
Total petrochemical hydrocarbon (TPH) concentrations can be a poor indicator of the actual risk associated with toxicity of the bioavailable fraction of the hydrocarbons. Due to their prevalence and intimate contact with the soil, the abundance and diversity of eukaryotic organisms such as fungi and nematodes can provide valuable insights into the toxicity of the contaminants in soils. Nematodes are the most abundant and species-rich metazoan taxon of terrestrial environments. Nematode assemblages typically comprise species with different feeding types and life strategies, as well as different levels of tolerance to changes in environmental conditions (Ekschmitt and Korthals, 2006;Wang et al., 2009). Because of their limited mobility, nematodes are unable to escape pollution events and are in direct contact with their surrounding microhabitat, both through ingestion and due to the permeability of their cuticle to many chemicals, and may react in diverse ways to contaminants. All those reasons justify the use of nematodes as sentinels for toxicity impact studies. The diversity and abundance of nematodes in soil can be a very good indicator of the degree of toxicity associated with hydrocarbon contaminants (Urzelai et al., 2000).
Metagenomics data can provide a huge amount of information on the genetic potential of microbial communities in soil to degrade a pollutant, allowing researchers to determine which community members are potentially degrading the pollutants at the time of sampling. Detection and characterization of key microbial species pinned to specific physiological pollutantdegrading processes will allow us to discover new metabolic pathways and identify previously unknown or ignored microbes that play critical supporting roles in the metabolism and cometabolism of toxic hydrocarbons in soil (Devarapalli and Kumavath, 2015).
The site used in the current study was a former industrial site in Ireland (latitude, 52.8523; longitude, −6.9270). The site was a former food manufacturing site on which pockets of mineral oilcontaminated subsoil (2-4 m below the surface) were discovered. Ecopiles had been used previously on this site in 2009 to successfully remediate TPH-contaminated soil (Germaine et al., 2015). Further site investigations in 2019 resulted in the discovery of additional areas of contamination. Approximately 6,600 m 3 of contaminated soil was excavated and stockpiled in preparation for remediation. The source of the hydrocarbons was suspected to be heavy fuel oil used in the heat generators on the site. It is likely that, over time, weathering (evaporation, leaching, and biodegradation) would have removed most of the low molecular components. The remediation target levels were set based on soil guideline values (SGV) using the United Kingdom Environment Agency's Contaminated Land Exposure Assessment (CLEA) methodology for category 4 screening levels (C4SLs). This sets a target level for various aliphatics and aromatics fractions. The site risk assessment showed that there was a possible source-pathway-receptor linkage in relation to a river running adjacent to the site (classified as a good ecological status), and therefore, the site required remediation. Ecopiling was again chosen as the desired technology to remediate the soil.
The aim of the work described here is to determine the microbial (bacterial and fungal) and nematode community composition within TPH-impacted soil at a former food industry site in Ireland. We also wanted to evaluate the hydrocarbon degradation potential of the indigenous bacterial populations by defining the genes and pathways that are present in Ecopiled soil and their taxonomic assignment. The purpose of this analysis is to use these data as a baseline to determine any temporal changes in community composition as the soil was being remediated.

MATERIALS AND METHODS
Seven samples of the stockpiled soil were analyzed for TPH and polycyclic aromatic hydrocarbon (PAH) concentration. Analysis and speciation were carried out by an accredited external commercial laboratory (ALS Environmental, United Kingdom). Briefly, TPH extraction was carried out on 10 g of soil sample using a hexane: acetone (50:50) solvent, and the extracts were analyzed using gas chromatography-flame ionization detection (GCxGC-FID). For PAH speciation, 5 g of soil samples was subjected to microwave extraction using a hexane:acetone:trimethylamine (50:45:5) solvent mixture and analyzed by GC-mass spectroscopy (GC-MS). The contaminated soil was amended with chemical fertilizer (24:5 N:P to achieve a 100:10: 1 C:N:P ratio). The soil was used to construct seven Ecopiles (Figure 1) as described in Germaine et al., 2015.

Ecopile Sampling
For each Ecopile, nine soil samples (∼200 g) were collected (3 from either side and 3 from the top). These soil samples were mixed to create one composite sample. From this composite sample, 2 subsamples were collected for analysis. Each subsample was sieved using a 2 mm sieve and analyzed for aerobic heterotrophic bacterial counts, TPH degrader counts, nitrate and phosphate content, soil pH and conductivity, and % organic matter. Samples were taken for DNA extraction and subsequent microbiome and metagenome analysis.

Processing of Soil and Total DNA Extraction
Soil samples from Ecopiles were first sieved and manually homogenized. Total DNA extraction was achieved using the FastDNA ® Spin Kit for Soil (mpbio) following the kit specifications with some modifications. Briefly, 2 g of soil was first resuspended in 5 ml NaCl 0.85% followed by 15 min of vortex. The sample was then centrifuged at 300 × g for 30 s to pellet soil debris. The supernatant was centrifuged again at 8.000 × g for 10 min, and the pellet was resuspended in 1 ml of PBS. An additional incubation with 60 μL/ml of lysozyme (1 mg/ml) at 37°C for 1 h was carried out before continuing with the extraction as described by the manufacturer. The homogenization step in the FastPrep Instrument was repeated four times, reducing the speed of the machine to 5.5.
For metagenomic analysis, total DNA extracted from the soil was sequenced at Parque Científico de Madrid (Spain) using the Illumina MiSeq 2x300 system and TruSeq kit for the shotgun library preparation, which were also carried out by the sequencing company. Raw reads were quality filtered using Trimmomatic software v0.39 (Bolger et al., 2014) to remove sequencing adapters and those reads with less than 150 nt, remaining parameters as default. Filtered and trimmed paired reads were assembled with metaSPAdes software v3.13 (Nurk et al., 2017) default parameters. Assembly quality and overall statistics were assessed using QUAST v5.0.2 (Gurevich et al., 2013).
Paired-end reads were merged using FLASH (V1.2.7) (Magoč and Salzberg, 2011). Quality filtering on the raw tags was performed under specific filtering conditions to obtain the high quality clean tag according to the QIIME (V1.7.0) quality control process (Bokulich et al., 2013;Caporaso et al., 2010). The tags were compared with the reference database (Gold database) using the UCHIME algorithm (UCHIME Algorithm) (Edgar et al., 2011) to detect the chimera sequences, and then, the chimera sequences were removed . Sequences analyses were performed by Uparse software (Uparse v7.0.10) (Edgar et al., 2011) using all the effective tags. Sequences with ≥97% similarity were assigned to the same OTUs. Representative sequence for each OTU was screened for further annotation. For each representative sequence, Mothur software was performed against the SSUrRNA database of SILVA Frontiers in Environmental Science | www.frontiersin.org March 2021 | Volume 9 | Article 624070 database (version 132) (Wang et al., 2007) for species annotation at each taxonomic rank (threshold: 0.8∼1) (Quast and Pruesse, 2013) (kingdom, phylum, class, order, family, genus, and species). To obtain the phylogenetic relationship of all OTUs representative sequences, MUSCLE (version 3.8.31) (Edgar, 2004) can compare multiple sequences rapidly. OTUs abundance information were normalized using a standard of sequence number corresponding to the sample with the least sequences. Subsequent analyses of alpha diversity and beta diversity were all performed based on this output normalized data.

Metagenome Analyses
Contigs longer than 500 bp were submitted to the SqueezeMeta v1.1.0 (Tamames and Puente-Sanchez, 2019) pipeline for functional annotation. Briefly, gene prediction was achieved with prodigal (Hyatt et al., 2010) and rRNAs using barnap (Seemann, 2014). The resulting rRNA sequences were classified using the RDP classifier (Wang et al., 2007). Diamond (Buchfink et al., 2015) searches against the taxonomic and functional databases GenBank nr NCBI database, eggNOG (Huerta-Cepas et al., 2016), and KEGG (Kanehisa and Goto, 2000) were performed. HMMER3 (Eddy, 2009) was also used for classification against the PFAM database (Finn et al., 2014). Taxonomic assignments of genes were performed using the LCA algorithm and the diamond results against the GenBank nr database. Cutoff identity values were identified for the assignment of taxonomic ranks as previously described (Luo et al., 2014;Tamames and Puente-Sanchez, 2019). Coverage and abundance estimations for each gene were calculated by mapping reads into genes using Bowtie2 (Langmead and Salzberg, 2012) and further processed with BEDTools (Quinlan and Hall, 2010) and SqueezeMeta scripts (Tamames and Puente-Sanchez, 2019) to compute average coverage and normalized TPM (transcripts per kilobase million) values for genes. The taxonomic profiling of the metagenome was based on the number of reads mapped against classified CDSs (coding DNA sequences), as reported by the SqueezeMeta analysis. Sankey network visualization was achieved using Pavian web application (Breitwieser and Salzberg, 2016). Bar plots of abundance at specific taxonomic ranks were visualized using ggplot2 R package (Wickham, 2011).
Metagenome coverage and sequence-based diversity were determined with the redundancy-based approach implemented in Nonpareil 3 software (Rodriguez-R et al., 2018). Forward filtered reads were used to calculate metagenome coverage and diversity, establishing a similarity threshold of 95%, a minimum overlapping percentage of 50%, and a maximum number of query sequences of 10,000.

Soil Conditions and Ecopile Construction
The hydrocarbon-impacted soil was generally found to be granular in nature, fine to medium sandy loam (sand 84%, silt 11%, and clay 5%), and contained 5-15% organic matter (mean 7.94 ± 3.6%) ( Table 1). The soil was found to have relatively high pH (average pH 8.9) due to the presence of lime waste from the industrial processes that had taken place on the site. This high pH, coupled with the petroleum contaminants, may explain the relatively low numbers of culturable bacteria present in the soil samples (10 6 -10 7 g −1 soil). Analysis of the soils before Ecopiling showed average TPH levels (C 10 -C 40 ) of 9,294 mg·kg −1 ( Table 2). The wide range of TPH levels (∼2,000-26,700 mg·kg −1 ) detected in the seven soil samples demonstrated the heterogeneous dispersal of the hydrocarbons in the soil. TPH analysis showed that the vast majority of the hydrocarbons present were high molecular weight compounds (C 16 -C 35 aliphatics and aromatics, but mostly aromatics).

Microbiome Analysis
Seven Ecopiles were constructed from the contaminated soil. Soil samples were taken from each of the seven Ecopiles, and DNA was extracted from two 0.25 g subsamples per Ecopile (2 technical replicates per Ecopile, 14 samples in total). These were analyzed for bacteria, fungi, and nematode microbiome analysis.
The sequence quality metrics for the 16S marker gene data are given in Supplementary Table S1. In the bacteria microbiomes, nine phyla represented 99% of the bacterial community. The remaining 1% consisted of more than 36 different phyla. Bacteria in the Proteobacteria phylum made up 60-75% of the reads. The Actinobacteria phylum was the next largest group (12-24%, an average of 17%), while Firmicutes made up 3-8% of the community. 734 different bacterial genera were identified in the TPH-contaminated soil with 489 ± 42 identified genera per Ecopile. These made up (on average) 50% of the total sequencing reads per Ecopile sample. At the genus level, the 14 samples had similar bacterial communities ( Figure 2). Given that all these Ecopiles are constructed from the same contaminated soil, this similarity in bacterial community is to be expected. 10 genera made up between 39% and 55% of the identified genera found in soil samples (an average of 46%). Lysobacter, a gram-negative genus, was the most identified individual genus across all 14 Ecopile samples (14-39%). Lysinomonas, Luteimonas, Mycobacterium, Pseudomonas, and Extensimonas typically made up 1-6% of the bacterial communities. A total of 1850 different bacterial OTUs were identified across the 14 samples, of which 1,115 OTUs (60%) were common to all 14 samples. Ecopile 5 had the most unique bacterial microbiome with 236 OTUs (12%) that were specific to FIGURE 2 | The top ten most abundant bacterial genera (relative abundance) in each of the Ecopiles (n 2). The last bar represents the average bacterial composition across all Ecopile soil samples (n 14).
Frontiers in Environmental Science | www.frontiersin.org March 2021 | Volume 9 | Article 624070 it. Analysis of the alpha diversity of the seven Ecopiles showed that there was a significant difference between the observed OTUs, Simpson, ACE, and PD whole tree indices of Ecopile 2 and Ecopile 7. There were statistically significant differences between some of the other Ecopiles. However, as there were only two replicates per Ecopile, inference of trends and differences between Ecopiles is not possible. The value of the data is only scientifically valid when looking at overall trends across all 14 samples. The sequence quality metrics for the fungal 18S marker gene data are given in Supplementary Table S2. Analysis of the fungal microbiome of Ecopile soils based on 18S rDNA sequencing found that, on average, 55% (range: 48-64%) of the sequence reads could be assigned to the kingdom Fungi. The remaining sequences were assigned to other eukaryotic soil organisms such as protists, plants, and animals. The fungal microbiome of the Ecopiles identified 7 fungal phyla in the soil. Unidentified fungal reads accounted for 5-50% of the fungal reads (an average of 17%). Reads assigned to the phyluma Ascomycota made up the vast majority (an average of 70%) of the reads in all but not in one Ecopile (Ecopile 5, 37%). At the genus level, 145 different fungal genera were identified in the TPH-contaminated soil with 66-90 different genera per sample. Figure 3 shows the most abundant fungal genera in the Ecopile soil samples. 66 of these genera belonged to the phylum Ascomycota, 39 belonged to Basidiomycota, 17 were unassignable at the phylum level, 12 belonged to Chytridiomycota, 4 belonged to Glomeromycota, 3 belonged to Entomophthoromycota, and 2 belonged to Cryptomycota. The top ten most abundant fungal genera across all Ecopile samples were Ochroconis, Fusarium, Pseudallescheria, Cochliobolus, Boeremia, Cladosporium, Powellomyces, Gaertneriomyces, Spizellomycetales, and Paramicrosporidium.  The sequence quality metrics for the nematode 18S marker gene data are given in Supplementary Table S3. On average, 6-7 different nematode genera could be identified in each Ecopile sample. Figure 4 shows the distribution of the nematode sequence reads assigned at the genus level. On average, 60% of the nematode reads were assigned to the genus Rhabditida, 25% to Diplgasteridae, and 10% to Tylenchida. With the exception of genus Monhysterida in Ecopile 1 (22% of nematode reads), the remaining genera combined made up less than 2% of the nematode reads. Sequences representing 15 individual species of nematode were identified in these soil samples and included Poikilolaimus oxycercus, Rhabditida sp., Ditylenchus destructor, Panagrolaimus detritophagus, Paratylenchus dianthus, Paralamyctes sp., Ditylenchus dipsaci, Acrostichus sp., Prismatolaimus dolichurus, Demaniella sp., Ceroglossini sp., and Aphelenchus avenae.

Metagenome Sequencing Overview and Taxonomic Profiling
A combined sample from different Ecopiles was used for DNA extraction and sequencing. After filtering and trimming, 5,477,721 paired reads (90, 11% of total raw reads) that range from 150 nt to 301 nt were maintained for subsequent assembly. The assembly resulted in 394,033 contigs longer than 500 bp (summing a total length of 362 Mbp). The longest contig was 550,098 bp in length, and the N50 parameter was 870. A total of 656,748 ORFs were predicted, of which 392,806 were annotated against KEGG database, 515,483 against COG database, and FIGURE 5 | Taxonomic profiling of the metagenome based on reads mapped against classified CDSs and metagenome diversity. (A) Sankey network of reads visualized with Pavian. The heights of the rectancles indicate the number of read assigned per taxa and rank, also indicated above/next to each taxa. The scale shows the Domain (D), phyllum (P), class (C), order (O), family (F) and genus (G) taxonomic ranks. Only a maximum of 23 taxa per rank is shown. (B) Nonpareil curve of the average metagenome coverage as a function of sequencing effort. The circle represents the coverage obtained. The line following the circle represents the projection of the fitted model. Horizontal dotted lines represent 100% (gray) and 95% (red) coverage. The model was calculated using only the forward sister of the reads pair, a similarity threshold of 95%, a minimum overlapping percentage of 50% and a maximum number of query sequences of 10,000. (C) Stacked barplot of relative abundance of reads mapping CDSs assigned to class, family and genus ranks. Taxa with a relative abundance lower than 0.05 for class and lower than 0.5 for family and genus are grouped under the "other" category.
Frontiers in Environmental Science | www.frontiersin.org March 2021 | Volume 9 | Article 624070 318,861 against Pfam database. 226 rRNAs, 59 of them corresponding to 16S rRNAs, and 3,132 tRNAs were found. Taxonomic profiling of the metagenome based on reads mapped against classified CDSs is represented in Figure 5A-C. Eukaryota and Archaea domains are irrelevant in the metagenome when comparing their abundances (1.26 and 0.92 thousand reads, respectively) to the number of reads belonging to the Bacteria domain (7.8 million reads). The relative abundance of reads assigned to the class level shows that the metagenome is clearly dominated by Gammaproteobacteria (29.90%), Actinobacteria (19.43%), Betaproteobacteria (17.12%), and Alphaproteobacteria (15.16%). The most abundant family is Xanthomonadaceae (20.70%), followed by Alcaligenaceae (12.26%) and Dietziaceae (7.21%), but other relevant families include Microbacteriaceae, Sphingomonadaceae, and Nocardioidaceae (4.63%, 3.56%, and 2.28%, respectively). Lysobacter (10.34%), Dietzia (7.20%), and Pusillimonas (4.69%) are the most represented genera. On the other hand, the estimated metagenome coverage based on sequence diversity is roughly 51%, with an estimate sequencing effort of 142 Gbp to achieve at least 95% of coverage ( Figure 5B). This was not unexpected, as similar coverages are common in metagenomes derived from environmental samples Johnston et al., 2016).

Metagenome Functional Characterization
An assessment of the metagenome functional potential was based on the distribution of KEGG metabolic pathways in the predominant families of the metagenome (Figure 6). The most abundant categories according to the number of genes include those of central metabolic pathways, such as carbohydrate and amino acid metabolism, replication and repair, membrane transport, and signal transduction. Interestingly, two categories with a prevalent representation among the dominant families are xenobiotic biodegradation and metabolism and lipid metabolism pathway categories, which, respectively, include aromatic compounds degradation and central lipid metabolism, in which the degradation of alkanes is funneled. Aliphatic and aromatic compounds are the main components of petroleum hydrocarbons. Therefore, we searched the metagenome to identify genes and pathways involved in the degradation of these compounds and assign them to specific populations.

Metabolism of Alkanes
The results of the identification of genes encoding alkanedegrading enzymes in the metagenome are summarized in Figure 7 and Supplementary Table S4. AlkB (alkane 1monooxygenase) is a nonheme iron integral membrane protein that carries out the initial hydroxylation of C 5 -C 12 n-alkanes, oxidizing them to 1-alkanols (Kok et al., 1989;Van Beilen et al., 1994;Van Beilen et al., 2003). Although almost half of the 55 copies of alkB identified within the metagenome have not been assigned to the family level, the ones which have been classified belong predominantly to Nocardioidaceae (eight copies predominantly assigned to the genus Nocardioides), Immundisolibacteraceae (five copies assigned to the genus Immundisolibacter), Moraxellaceae (five copies), and Nocardiaceae (four copies assigned to the genus Williamsia). LadA (long-chain alkane monooxygenase) also participates in Frontiers in Environmental Science | www.frontiersin.org March 2021 | Volume 9 | Article 624070 9 the terminal oxidation pathway of n-alkanes. LadA is a flavoprotein monooxygenase that utilizes dioxygen to insert an oxygen atom into long-chain alkanes (Li et al., 2008;Ji et al., 2013). Seventeen of the 42 copies of ladA identified in the metagenome belong to the Microbacteriaceae family, with seven of them assigned to the genus Microbacterium. On the other hand, short-chain alkanes (<C 5 ) are usually metabolized by methane monooxygenases (MMOs), which are responsible for the first step in their catabolism. Genes encoding the three polypeptides that constitute the pMMO (pmoABC) have been found in the metagenome, while only the α subunit of the hydroxylase component (mmoX) of sMMO has been identified ( Figure 7). Moreover, mmoX has been found three times in the metagenome (two of them assigned to the Methylocystaceae family), while copies of the pmoABC cluster (six of pmoA, nine of pmoB, and 13 of pmoC) are distributed mainly within Ectothiorhodospiraceae (three pmoA, three pmoB, and seven pmoC), Nitrosomonadaceae (one pmoA, two pmoB, and two pmoC), and Methylocystaceae Frontiers in Environmental Science | www.frontiersin.org March 2021 | Volume 9 | Article 624070 10 (two pmoC) families. The limited presence of sMMO genes compared to pMMO is in agreement with previous works, since whereas pMMO is ubiquitous in methanotrophic bacteria, sMMO has been isolated from only certain strains of methanotrophs (Hanson and Hanson, 1996). In addition, the prmABCD gene cluster encodes the components of a dinuclear iron-containing multicomponent monooxygenase that plays an essential role in propane oxidation via subterminal oxidation to 2-propanol and then further to acetone (Kotani et al., 2003;Kotani et al., 2007). This prmABCD gene cluster has a discrete presence in the metagenome, with a single copy of all the subunits (except prmB, three copies), most of which are only classified at the class level, belonging to Actinobacteria ( Figure 7A). Finally, another gene cluster, acmAB, encodes a Baeyer-Villiger monooxygenase and esterase, involved in the metabolism of the acetone derived from propane oxidation (Kotani et al., 2007). Their abundance is also limited to five and ten copies, respectively, assigned mainly to Actinobacteria, Gammaproteobacteria, and Alphaproteobacteria.

Metabolism of Aromatic Compounds
Ring-hydroxylating dioxygenases and monooxygenases identified within the metagenome are shown in Figure 8 and Supplementary  Table S5. Biphenyl 2,3-dioxygenase (encoded by bphAa, found nine times in the metagenome, three of them belonging to the genus Immundisolibacter) and toluene monooxygenase (encoded by tmoA, found two times) are involved in the conversion of biphenyl and toluene, respectively, to benzoate. The benzoate ring is further oxidized by benzoate 1,2-dioxygenase (encoded by benA, with 102 copies identified, 21 of them belonging to the genus Immundisolibacter, five to genus Pusillimonas, and four to genus Sphingopyxis) and hydroxybenzoate 3-monooxygenase (encoded by pobA, with 66 copies identified with eight of them assigned to Microbacteriaceae family, six to Alcaligenaceae, and five to Hyphomicrobiaceae). Naphthalene 1,2-dioxygenase (encoded by nahAc) was found to be present only once in the metagenome. This enzyme is implicated in a wide range of different aromatic substrates degradative reactions (Resnick et al., 1996;Ferraro et al., 2017). These reactions include the conversion of naphthalene to salicylate, which is further hydroxylated by salicylate hydroxylases (18 copies of nagG and 80 copies of nahG were found in the metagenome, principally belonging to Alcaligenaceae, Comamonadaceae, Hyphomicrobiaceae, Microbacteriaceae, Phyllobacteriaceae, and Sphingomonadaceae families). Other genes encoding relevant ring-hydroxylating enzymes also found in the metagenome include those involved in the degradation of ethylbenzene (21 copies etbA), phenanthrene (three copies of nidA), phenol (one copy of poxA), and terephthalate (seven copies of tphA2, five of them belonging to Comamonadaceae family). Furthermore, ring cleavage  Table S5).
Benzoate aerobic mineralization via the benzoyl-CoA pathway first involves the ligation of acetyl-CoA to benzoate by benzoate-CoA ligase (encoded by badA) and 4-hydroxybenzoate-CoA ligase (encoded by hbaA gene), which is further epoxidated by benzoyl-CoA 2,3 epoxidase (encoded by boxABCD gene cluster) (Rather et al., 2010). The enzymes responsible for this benzoate degradation pathway have been identified in the metagenome (45 CDSs of badA, 4 of hbaA, 20 of boxA, 31 of boxB, 28 of boxC, and 14 of boxD) (Figure 8; Supplementary Table S5), with most of the genes belonging to Alcaligenaceae and Comamonadaceae families. Figure 9 summarizes the implication of the main families and genera in the different pathways for the degradation of TPHs according to the metagenomic data. It is interesting to note that several families that are relatively abundant do not seem to participate in TPHs degradation, since no genes implicated in these pathways have been found within these families. These families include Alcanivoracaceae, Sphingobacteriaceae, Acidimicrobiaceae, Oxalobacteraceae, Rhodobiaceae, Rhodanobacteraceae, Intrasporangiaceae, Micrococcaceae, and Planococcaceae. Since many of these families are among the major constituents of soils microbiota (Delgado-Baquerizo et al., 2018), they should be performing other soil functions not related with TPHs degradation.

DISCUSSION
The introduction of inorganic and organic pollutants such as heavy metals or petroleum hydrocarbons can adversely affect both the diversity of microorganisms present in the soil and their ability to carry out these essential ecosystem roles. Organic compounds, which dominate the complex mixture of molecules represented by petroleum hydrocarbons, can provide certain soil microbes with a source of carbon for their respiration (Sima et al., 2019). Consequently, microbial communities in soils are essential in the degradation of contaminants, and the key to successful microbial bioremediation is to harness the naturally occurring catabolic capability of microbes to catalyze transformations of soil pollutants. Microbial bioremediation is a cost-effective and ecofriendly technology that provides sustainable ways to clean up contaminated soils (Ying and Wei 2019).
The current study utilized molecular tools to profile the bacterial, fungi, and nematode microbiomes present in soil contaminated with petrochemicals. High throughput sequencing of microbial communities allows for the detection of both culturable and unculturable microorganisms. This, in turn, allows for the possibility of identifying rarely studied microbes which are actively involved in decontamination of the soil and which can be further used to optimize the existing bioremediation strategies and to develop new ones (Stenuit et al., 2008).
16S rDNA analysis of the soils used in our study found high abundances of bacteria classified to Proteobacteria, Actinobacteria, and Firmicutes, and this is similar to previous reports on the bacterial microbiome studies of petrochemical-contaminated soils (Jung et al., 2016;Siles andMargesin, 2018 andBorowik et al., 2019). Lysobacter made up the largest individual genus across all 14 Ecopile samples (14-39%). This taxon has previously been reported as the most dominant genus in diesel-contaminated soils (Borowik et al., 2019), where it is observed that Lysobacter brunescens comprised as much as 29.9% relative abundance in soil samples. While we were unable to link this genus with any pollutant degradation ability, it has been suggested that abundant taxa play a dominant role in the stability and maintenance of soil microbial communities, while rare taxa served as a diverse pool to enhance both bacterial resilience and resistance under environmental disturbances (Jiao et al., 2019). With the exception of commonly reported genera such as Pseudomonas, Sphingomonas, and Mycobacteria, many of the other genera present in the Ecopile soil (taxa such as Glutamicibacter, Paeniglutamicibacter, Extensimonas, Luteimonas, and Lysinimonas) are rarely reported as dominant members of the soil community. The genus Extensimonas, for example, is a member of the family Comamonadaceae, which belongs to the Betaproteobacteria class. This family contains numerous genera, including Acidovorax, Brachymonas, and Comamonas, which are widely reported in contaminated soil microbiomes. Extensimonas, on the other hand, has rarely been reported in soil microbiomes, and there has only been a handful of studies on their involvement in hydrocarbon degradation (Zhang et al., 2013;Gauchotte-Lindsay et al., 2019). This soil therefore potentially presents a rich source of unusual species with potentially novel hydrocarbon-degrading abilities.
The mycobiome structure in the Ecopile soil mainly consisted of members of Ascomycota and Basidiomycota and a high proportion of unclassified fungal reads, which is consistent with studies of similar contaminated soils. Unsurprisingly, Ascomycota was by far the most dominant fungal phylum in each of the Ecopile soil samples. Ascomycota has been reported as the largest fungal phylum with over 64,000 identified species in almost 6400 genera and the most dominant fungal phylum in soils. Siles and Margesin (2018) reported a fungal profile in which Ascomycota made up 0.6-95.3% and Basidiomycota made up 0.7-62.7% of the soil mycobiome. Consistent with the findings in the current study, they also reported a very high incidence of unclassified fungal reads. Wang et al., (2018) also found that Ascomycota was the dominant fungal phylum (∼71%) in crude oil-contaminated soil and that the abundance of this phylum increased in the presence of oily sludge. The dominant fungal genera found in the Ecopile soil were Ochroconis (∼15%), Pseudallescheria (9%), Gaertneriomyces (9%), and Fusarium (7%), many of which have been reported to be present in contaminated soils. Borowik et al., (2017) showed that hydrocarbon-polluted soil was dominated by fungi from the genera Fusarium (37.9%), Candida (13.8%), Microsporum (13.8%), and Penicillium (13.8%), while Ogbonna et al., (2020) identified a number of hydrocarbon-degrading fungi in oil-contaminated soil including some species of Fusarium, Penicillium, Kodamaea, and Lentinus. Wu et al., (2020a) and Wu et al., (2020b) reported the presence of Ochroconis species in two different petroleumcontaminated soils in China at relative abundances of 1-5%. Aranda et al., (2015) reported the ability of Pseudallescheria isolates to degrade PAHs, and Covino et al., (2015) found that a species of Pseudallescheria was highly efficient (71% removal rates) at degrading aliphatic hydrocarbons from clayey soils. The presence of these groups of fungi in the Ecopile soil suggests that they have a possible role in hydrocarbon degradation.
Nematode communities offer a potent ecological tool for assessing soil disturbances and for the development of a biomonitoring system. There are many records of specific nematode species showing a preference for certain environmental factors as well as many studies on changes of nematode community composition in response to soil pollution (Urzelai et al., 2000). In our analysis, the primer set used to analyze the nematode communities was not specific to the phylum Nematoda. On average, just 41% of the reads were assigned to this phylum, although the percentage of reads assigned differed significantly from sample to sample (12-69%). The presence of 15 different nematode species in the Ecopile soil samples was identified, with representative species that were bacteriovorous, fungivorous, insect parasitic, and plant parasitic. Blakely et al., (2002) also observed nematodes from all tropic levels in soils that were contaminated with PAHs. Contrary to their predictions, PAHs affected nematodes communities indirectly by altering the physical habitat of the nematodes rather than the direct chemical toxicity of PAHs. Increased bulk density of soil and hydrophobic properties of PAHs decrease the amount of habitable space within soil pores affecting the mobility and hence the distribution of nematodes. Monteiro et al., (2018) studied the effect of crude, diesel, or motor oil on the survival of free-living nematodes in soil. They observed that most species exhibited moderate to extreme mortality levels in oil treatments and experienced an increased mortality with time. Species sensitivities to oil did not follow patterns of taxonomic relatedness, contradicting the idea that closely related species should intrinsically respond similarly to pollution. Out of 6 species of Rhabditidae tested, only Bursilla monhystera was highly tolerant, while C. elegans and cryptic species of Litoditis marina were among the most sensitive taxa. In contrast to their study, 60% of the nematode reads in the Ecopile samples were from the genus Rhabditida. The dominance of this group of nematodes in the Ecopile soils samples may be due to the fact that the hydrocarbon contaminants mainly consisted of aged high molecular weight compounds which tend to be less toxic than freshly contaminated soil. Monteiro et al., (2018) recommended that future effect studies do not focus on a single model species but instead incorporate multiple species for a better and more robust assessment of pollutant effects.
Metagenomic analysis presented here has shown that bacteria are the main constituents of the Ecopiles soil community, representing most of the biomass. The composition of the bacterial community at the Ecopiles soil, according to metagenomic analysis, is consistent with the composition determined in different Ecopiles by analyzing 16S amplicons, and the same genera and families were detected. Furthermore, the major genera identified are bacteria previously shown to participate in the degradation of organic pollutants. Members of the genus Lysobacter, the most abundant genus in both analyses, have been previously isolated from pesticide sludge (Ye et al., 2015), chlorothalonil-contaminated soil (Wang et al., 2011), and oil-contaminated soil (Chaudhary et al., 2017). Bacteria affiliated to other abundant genera, such as the genus Dietzia, have also been isolated from petroleum-contaminated soils (Von der Weid et al., 2007;Bihari et al., 2010) and oil reservoirs worldwide (Bødtker et al., 2009;Wang et al., 2011), and several Dietzia strains described to date have been shown to degrade a wide variety of aliphatic hydrocarbons with chain lengths that range from C 6 to C 40 (Bihari et al., 2010;Wang et al., 2011), aromatic compounds (Von der Weid et al., 2007Fathi et al., 2016), and conventional crude oil (Wang et al., 2011). Oil-degrading bacteria belonging to the genus Pusillimonas have been found in petroleum-contaminated sites (Cao et al., 2011;Obi et al., 2016), and the ability of some members of this genus to metabolize alkanes and aromatic compounds has been reported (Stolz et al., 2005;Li et al., 2013;Obi et al., 2016).
The identification and phylogenetic assignation of genes encoding the main enzymes participating in each of the pathways have allowed us to model the main degradation routes for aliphatic and aromatic compounds. Regarding aliphatic hydrocarbons degradation, the main role in degradation appears to be performed by bacteria from the Ectothiorhodospiraceae, Microbacteriaceae, and Nocardioidaceae families and to some extent by the Immundisolibacteraceae, Nocardiaceae, Moraxellaceae, and Sphingomonadaceae families. The relevance of Ectothiorhodospiraceae can be attributed to the degradation of short chain alkanes since most of the pmoABC genes found in the metagenome were classified to the Ectothiorhodospiraceae family and have been further assigned to "Candidatus Macondimonas" at the genus level. "Candidatus Macondimonas" was proposed by Karthikeyan et al., (2019) as a novel genus within Gammaproteobacteria. The type strain, "Ca. Macondimonas diazotrophica" KTK01 T , was isolated from crude oil-contaminated sediments. The limited presence of sMMO genes compared to pMMO is in agreement with previous works, since whereas pMMO is ubiquitous in methanotrophic bacteria, sMMO has been isolated from only certain strains of methanotrophs (Hanson and Hanson, 1996). The other families seem to participate in the degradation of medium to long chain alkanes. Their role in alkane degradation is not unexpected. The isolation of bacterial strains with hydrocarbon-degrading capabilities from oil-contaminated sites has been reported for Nocardioides (Jung et al., 2002), Williamsia (Obuekwe et al., 2009), and Microbacterium (Ali et al., 2020) genera. Moreover, regarding the genus Nocardioides, genes encoding alkane monooxygenases have been previously identified in several strains (Kimbrel et al., 2013;Brown et al., 2017). On the other hand, the Immundisolibacter genus was first described by Corteselli et al., (2017), and I. cernigliae T3.2 T was isolated from a PAH-contaminated bioreactor-treated soil and described as a high-molecular-weight PAH-degrading bacterium. Although the presence of genes encoding alkane monooxygenases in I. cernigliae has not been reported before, the analysis of a metagenome, derived from a polluted sediment dominated by I. cernigliae, showed several copies of putative n-alkane-degrading genes alkB and ladA (Gacesa et al., 2018).
Regarding the degradation of aromatic compounds, only five families appear to be participating in peripheric (upper) pathways. These families are Alcaligenaceae, Sphingomonadaceae, Caulobacteraceae, Comamonadaceae, and Immundisolibacteraceae, all of them belonging to the Proteobacteria phylum. The majority of the formally described genera of aromatic hydrocarbon-degrading bacteria belong to the large Proteobacteria phylum (Prince et al., 2019).
Within Betaproteobacteria, Alcaligenaceae and Comamonadaceae are the most representative families harboring aromatic compound-degrading genes identified in the metagenome. Regarding the Alcaligenaceae family, the classification of several copies of different genes belonging to both peripheral and central pathways and the benzoyl-CoA pathway to the genera Pusillimonas and Eoetvoesia are significant ( Figure 8). Moreover, the genus Pusillimonas is one of the most represented genera within the general taxonomic profile of the metagenome, and its aromatic hydrocarbon-degrading capability has been previously described (Stolz et al., 2005). The presence of the genus Eoetvoesia in petroleum-contaminated soil has also been reported (Liu et al., 2020). On the other hand, most genes assigned to the Comamonadaceae family have not been further classified at the genus level, but it is worth mentioning that several copies of different genes involved in the three types of pathways represented belong to this family. Some of the most common genera within Betaproteobacteria which are known to degrade hydrocarbons and related substituted molecules under aerobic conditions belong to Comamonadaceae family, including Acidovorax, Comamonas, Delftia, and Polaromonas (Tan and Parales, 2019). Furthermore, in a study reported by Mukherjee et al., (2017), Comamonadaceae was found to make significant contribution to the abundance of hydrocarbon degradation genes in oil sand tailing ponds samples.
Within Alphaproteobacteria, Sphingomonadaceae and Hyphomicrobiaceae are the most representative families. Members of Sphingomonadaceae have been known for many years as degraders of aromatic hydrocarbons (many belonging to Sphingomonas, Sphingobium, Novosphingobium, and Sphingopyxis genera) and particularly of PAHs in contaminated soil environments (Kertesz et al., 2019). Specifically, Sphingopyxis was found to be one of the genera with a greater number of genes encoding aromatic-degrading enzymes identified in the metagenome (Figure 8). The genetic potential for the degradation of aromatic compounds of Sphingopyxis has been previously analyzed (Yang et al., 2020).
The genes assigned to Hyphomicrobiaceae family mostly belong to the genus Devosia. The growth of some strains of Devosia with aromatic compounds as the sole carbon and energy source has been previously reported, for instance, with naphthalene  and fluoranthene (Zhao et al., 2016).
Within Gammaproteobacteria, Immundisolibacter is the most representative genus regarding aromatic compound degradation found in the metagenome, although it should be noted that all the genes belonging to this genus are only involved in peripheral pathways. The PAH-degrading capability of Immundisolibacter has been previously described (Corteselli et al., 2017).
In addition to the Proteobacteria phylum, another relevant taxon in the metagenome is Actinobacteria and within this phylum, the Microbacteriaceae family.
Specifically, Microbacterium is one of the genera present in the metagenome with more aromatic hydrocarbon-degrading genes. The degradation of different PAHs by Microbacterium members has been reported (Salam et al., 2014;Jin et al., 2017). The presence of these genes within the soil microbiome suggests that the indigenous soil microflora have significant potential for degrading the petroleum-derived contaminants.

CONCLUSION
Microbe-assisted phytoremediation promises to be the most sustainable and environmentally friendly approach for removing organic contaminants from polluted soils/sediments. Ecopiling utilizes both biostimulation of the indigenous pollutantdegrading microbial community and microbe-assisted phytoremediation to remove petroleum hydrocarbons and PAHs from contaminated soils. Our analysis of the initial Ecopile soil found that it contained diverse bacterial and fungal communities, but these communities were dominated by a small number of genera. Lysobacter was the single most abundant bacterial genus across all 14 soil samples (based on 16S microbiome data). This was also observed in the metagenome data where 10% of the CDSs were assigned to this genus. Although our ability to draw conclusions is limited due to the low genome coverage, this group does not seem to have a major role in hydrocarbon degradation as no genes involved in aliphatic or aromatic catabolic pathways were assigned to this group. On the other hand, less abundant genera of bacteria such as Pusillimonas, Dietzia, and Bradyrhizobium were found to be associated with hydrocarbon degradation genes in the metagenome. These groups represent possible novel targets for isolation and development into efficient consortia for bioaugmentation of hydrocarbon-contaminated soil.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: <br>https://www.ncbi.nlm. nih.gov/. Sequence data generated in this study are deposited in the National Centre for Biotechnology Information (NCBI) under the bioproject number PRJNA664743. Bacterial 16S data sets are archived under biosample numbers SAMN16231369-16231382, fungal 18S data sets under SAMN16233626-16233639, and nematode 18S data sets under SAMN16234236-16234249. Metagenomic sequence data have been deposited in the NCBI Sequence Read Archive (SRA) under the accession number SRR12703469. </b>.

FUNDING
This project has received funding from the European Union's Horizon 2020 research and innovation programme "GREENER" under grant agreement number 826312 and the Institute of Technology Carlow Postgraduate Presidents Awards.

AUTHOR CONTRIBUTIONS
MW, DG-S, PS-L, and MR-N carried out the microbiome and metagenome analysis. RC and RM preformed the physicochemical analysis. XL and KG set up the Ecopiling experiment. MM, DD, RR, XL, and KG were involved in the experimental design and preparation of the manuscript.