Genetic and Functional Diversity of Bacterial Microbiome in Soils With Long Term Impacts of Petroleum Hydrocarbons

Soil contamination with petroleum, especially in the area of oil wells, is a serious environmental problem. Restoring soil subjected to long-term pollution to its original state is very difficult. Under such conditions, unique bacterial communities develop in the soil that are adapted to the contaminated conditions. Analysis of the structure and function of these microorganisms can be a source of valuable information with regard to bioremediation. The aim of this study was to evaluate structural and functional diversity of the bacterial communities in soils with long-term impacts from petroleum. Samples were taken from the three oldest oil wells at the Crude Oil Mine site in Węglówka, Poland; the oldest was established in 1888. They were collected at 2 distances: (1) within a radius of 0.5 m from the oil wells, representing soil strongly contaminated with petroleum; and (2) 3 m from the oil wells as the controls. The samples were analyzed by 16S rRNA sequencing and the community level physiological profiling (CLPP) method in order to better understand both the genetic and functional structure of soil collected from under oil wells. Significant differences were found in the soil samples with regard to bacterial communities. The soils taken within 0.5 m of the oil wells were characterized by the highest biodiversity indexes. Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria were strongly correlated with biological activity in these soils. Families of Alphaproteobacteria were also dominant, including: Bradyrhizobiaceae, Rhizobiaceae, Rhodobacteraceae, Acetobacteraceae, Hyphomicrobiaceae, and Sphingomonadaceae. The study showed that the long term contamination of soil changes bacterial communities and their metabolic activity. Even so, natural bioremediation leads to the formation of specific groups of bacteria that actively grow at the site of contamination in the soil.


INTRODUCTION
A constantly increasing pollution of soil, air and water, by processes such as industrial activity, low efficiency of metal recovery methods, and agricultural chemicalization, poses a major impact to the health of humans and to nature, in general (Semple et al., 2006;Child et al., 2007;Zhong et al., 2011). With regard to soil pollution, it is not only necessary to create an efficient monitoring system but also to develop economical and efficient techniques for treatment and immobilization of toxic compounds at the place of their deposition, thus preventing more widespread environmental pollution (Gałązka et al., 2012;Nwaichi et al., 2015;Silva et al., 2015). There are a number of technologies that enable the deactivation or removal of toxic substances from a substrate, in most cases based on physicochemical extraction methods (Sutton et al., 2013;Gałązka and Grządziel, 2016). Unfortunately, their application is associated with extremely high costs and a complete elimination of soil microorganisms. The restoration of biological activity in these areas is very difficult and almost always requires human intervention (Doong and Lei, 2003;Khan, 2005;Vazquez et al., 2013). Therefore, rebuilding near-natural ecosystems in such cases is an extremely long and expensive process.
Long-term soil pollution in oil fields is a serious environmental problem and restoring the soil to its original state is very difficult. Under such conditions, unique bacterial communities adapted to the contamination conditions develop in the soil. Long-term contamination causes an accumulation of a variety of petroleum products in the soil, including both polycyclic aromatic hydrocarbons (PAHs) and aliphatic hydrocarbons. It negatively influences both biodiversity of microorganisms and soil function. It is especially visible in the areas of oil wells, where the accumulation has been going on for many years. Such constantly contaminated soil has no chance for effective remediation; however, even in these unfavorable conditions there are groups of active microorganisms that are able to dwell in the soil. Decomposition of complex mixtures, such as petroleum can be performed by mixed microorganism cultures (microbial complexes) with diverse activities and the ability to use hydrocarbons as a source of carbon and energy (Hartmann and Widmer, 2006;Meng et al., 2011;Gałązka and Gałązka, 2015). Typically, biodegradation of crude oil derivatives occurs via cometabolism, which consequently plays a very important role in the bioremediation process. The hydrocarbons are not a source of carbon and energy in this case, but they are co-substrates, with their degradation occurring sequentially by the participation of different groups of microorganisms. At present, cometabolism is considered to be one of the most important mechanisms in the transformation of PAHs in soil (Doong and Lei, 2003). For example, parathion is cometabolized by Pseudomonas stutzeri to 4-nitrophenol and diethylphosphate, and phenol is then used as a source of carbon and energy by P. aeruginosa. Often, phenol or toluene is used as a co-substrate for compounds resistant to biodegradation. However, this does not mean, that in order to achieve an effective biodegradation process, one only has to use a co-substrate and the most active strains. The situation is more complex, as it is often necessary to introduce bacteria which are almost inactive during decomposition, but may facilitate the enzymatic activity of other species (Dellagnezze et al., 2016). Enzymes synthesized by individual strains of bacteria can provide for hydroxylation, oxidation, denitrification, deamination, hydrolysis or acylation reactions that complement each other in the formation of complete pathways of contamination mineralization (Child et al., 2007;Naether et al., 2012;Chen et al., 2017).
Therefore, it is very important to know precisely the genetic structure and function of microorganisms at the site of contaminations, both for a greater understanding of the processes of bioremediation and as a source of inoculum in managing such areas (Gomez et al., 2004;Mishra and Nautiyal, 2009;Peng et al., 2015). Re-establishing soil microbial communities is essential as they are responsible for physiological and metabolic processes of great importance for soil quality (Ranjard et al., 2003;Bundy et al., 2009;Lauber et al., 2009;Rutgers et al., 2016). Studies of bacterial communities in contaminated soil are enhanced by recent advances in genomics, transcriptomics and proteomics. Included in genomic methods are functional bacterial fingerprinting (Biolog EcoPlates System) and Next Generation Sequencing (NGS). NSG of hypervariable regions, such as in 16S rRNA genes from bacteria, allows one to determine the genetic diversity of microorganisms within a population without the need for cell culture (Xu, 2010;Malla et al., 2018;Pichler et al., 2018). The PCR-based method has been successfully applied to identify microorganisms acting as pollutant degraders in soil, such as naphthalene, salicylate or benzoate degraders from the class β-proteobacteria (Friedrich, 2006;Kozich et al., 2013;Sarkar et al., 2016). In addition to determining the diversity of microorganisms, metagenomics analysis can also be used to search for functional genes with regard to pollutant degradation, and so proof that a specific metabolic activity is occurring among the members of the microbiome (Bartram et al., 2011;Rosselli et al., 2016). In order to obtain as much information as possible about functional and genetic structure of soils in this study, it was decided to use 2 methods: the NGS technique (V3-V4 16S rRNA gene region), and the community level physiological profiling (CLPP) method.
The importance of evaluating bacterial functional and structural diversity in soil directly from the contaminated site is that you are defining a natural bioremediation process (Hartmann and Widmer, 2006). Unfortunately, there is still a lack of relevant indexes to assess soil quality. So far, although a number of indicators have been used to evaluate nonagricultural soil quality, no universal formula has been developed (Wyszkowska and Kucharski, 2005;Bastida et al., 2006Bastida et al., , 2008Dawson et al., 2007;Wang et al., 2010;Wyszkowska et al., 2015). Included among available indicators is the soil quality index, defined as the smallest set of soil parameters that can provide information about soil quality and its ability to perform certain functions (Nannipieri et al., 2003). The parameters of this index are: pH, organic matter content, microbial biomass, enzymatic activity, and respiratory performance. One of the more well-known quality indexes is the microbiological degradation (MD) Index, which takes into account the following parameters: semiarid degraded soils, dehydrogenase, urease activities, respiration, water-soluble carbon, and water-soluble carbohydrates (Bastida et al., 2006). Another index is the Soil Quality Index (Dawson et al., 2007). Microbial biomass carbon, dehydrogenase activity, seed germination, respiration, and earthworm toxicity were the parameters used in this case. Chen et al. (2017) presented the Integrated Pollution (IP) Index, which included measures of several heavy metals. So far, none of these sets of parameters have been identified as a universal molecular indicator of soil quality. Such indicators could be extremely useful in determining the role of specific microorganisms in the soil and the selection of key enzymes related to improving the functioning of the soil (Ros et al., 2008).
The aim of this study was to evaluate the functional and structural diversity of bacterial communities in soils with long-term impacts from contamination with petroleum. The assessment was conducted on the basis of distance from direct contamination (soil taken directly from oil wells and from 3 m distance). The research hypothesis assumes that significant changes will be observed in the functional and structural diversity of microorganisms between the soils. Determining the function and quality of bacteria in contaminated soil is the basis for further research to select the strains active in bioremediation. Such bacteria could then be used in the industry in future bioremediation processes.

Soil Samples
Soil samples were collected in July 2017 according to the methods of Polish Standard (1998). The soils (light loamy sand -soil texture Casagrande'a method) were taken from the area of oil wells in Węglówka near Krosno (Podkarpackie Voivodeship, Poland). The soils had been contaminated with petroleum over the long-term. Samples were taken from the 3 oldest oil wells at the Crude Oil Mine site in Węglówka, Poland, the oldest of which dated from 1888 (Figure 1). The Global Positioning System (GPS) locations of the oil wells are given in Table 1. Soil samples were collected at two distances: within a radius of 0.5 m of the oil wells (OWP -Oil Well Petroleum; OWP1, OWP2, and OWP3) and at a distance of 3 m from the oil wells (OW -Oil Well; OW1, OW2, and OW3). The soil samples were taken from the 0-20 cm layer in three replicates and passed through a 2 mm sieve. The samples were then stored in a refrigerator (4 • C) until they were analyzed. The basic chemical, biochemical and microbiological properties in contaminated soil were determined. The basic chemical properties of the soils were marked: pH (PN-ISO 10390:1997), total organic carbon (C org -using the Tiurin's method) and total Kjeldahl nitrogen content (N total -using flow spectrometry, wet sample mineralization). In addition, the total content of petroleum hydrocarbons, and the PAH levels were determined by gas chromatography, while the content of trace elements after microwave treatment with aqua regia were assayed using inductively coupled plasma mass spectrometry (ICP-MS). Moreover, the functional and genetic bacterial diversities were analyzed.

Determination of PAH Levels
The analysis of PAHs comprised 16 individual compounds from the US EPA list. Soil samples were sieved to obtain a grain size ≤ 0.10 mm and spiked with 10 µl of internal standard solution containing 5 deuterated PAHs: d8-naphtalene, d10-acenaphtene, d10-phenanthrene, d12-chrysene and d12perylene, each at a concentration of 100 µg cm 3 . Samples were extracted with dichloromethane in an Accelerated Solvent Extractor (ASE200, Dionex Co., Sunnyvale, CA, United States); extraction temperature 100 • C, static time 5 min, and pressure 1200 psi. The extracts were concentrated in 1 ml hexane, cleaned up on glass columns filled with 1 g activated silica gel suspended in dichloromethane, and finally eluted with 5 ml CH 2 Cl/n-hexane (2/3, v/v). PAH levels were determined by Triple Quadrupole Gas Chromatograph-Mass Spectrometer (GC-MS/MS) on an Agilent 7890B GC system (Agilent Tech., Santa Clara, CA, United States), equipped with an Agilent 7000C detector and Agilent 7693 Autosampler. PAH resolution was achieved on a HP-5 MS fused capillary column with film thickness of 0.25 µm, at a 250 • C splitless injection system temperature with helium as a carrier gas. Data were collected in Multiple Reaction Monitoring (MRM) mode. The certified reference material (CRM 131), laboratory control sample and solvent blank sample procedure were used for quality assurance and quality control (OA/OC). The precision expressed as a relative standard deviation (RSD) was in the range of 5-12% and the recovery for individual compounds from CRM 131 was within 62-84%. The limit of quantification (LoQ) for individual PAH compounds ranged from 0.02-2.10 µg kg −1 , while the limit of detection (LoD) fitted within the 0.01-0.81 µg kg −1 range.

Extraction of Petroleum Hydrocarbons
Five grams of dry soil with grain size ≤ 0.10 mm were extracted with 120 cm 3 petroleum ether in an automated Soxhlet apparatus (Buchi Universal Extraction System, Buchi 811) for 35 cycles at 40-60 • C. Extracts were collected in glass vials and evaporated on a vacuum rotary evaporator at 40 • C to near dryness. The glass vials were then left open under a flow hood to remove traces of ether. The concentration of petroleum hydrocarbons expressed as gkg −1 of soil dry mass was evaluated from the weight of the concentrated hydrocarbon extract. The extractions were carried out in triplicate. For quality control a solvent blank was included for each analytical series.

Determination of Trace Element Content in Soil Samples -ICP-MS Technique
Soil samples were digested in aqua regia with involvement of middle pressure (32 bars) microwave digestion system -Mars Xpress from CEM Corp., Matthews, NC, United States. Quantitative analysis of metals content were conducted on ICP-MS 7500ce instrument from Agilent Technologies, Santa Clara, CA, United States. The 0.5 g of air dried soil sieved through 2 mm mesh and grinded on mortar grinder were used for the digestion. Aqua regia was prepared from Instra -Analyzed grade hydrochloric and nitric acids from J.T. Baker  Public data was deposited in the Sequence Read Archive (SRA) NCBI: https://www.ncbi.nlm.nih.gov/sra/SRP146074. OWP, soil sample taken within a radius of 0.5 m; OW, soil sample taken at a distance of 3 m from the oil well.
Chemical Co, Phillipsburg, NJ, United States. After digestion the solution was transferred to falcon vial and diluted to 50 ml with 0.05 µS/cm distilled water. Prior to ICP-MS analysis, samples were diluted 10 times. Exactly the same procedure was performed for blanks and certified reference materials.
To minimize the matrix effect and ensure long term stability, analysis were conducted in presence of 45Sc, 89Y, 159Tb as internal standards. Accuracy of the method was 10% and quantification limits were 0.01 mg kg −1 . Detailed information about methods was described in paper Gałązka and Gałązka (2015).

CLPP Analysis Using Biolog EcoPlates
The CLPP was evaluated using Biolog EcoPlates (Biolog Inc., Hayward, CA, United States) with 31 different carbon sources (Pohland and Owen, 2009). Soil suspension for the inoculation of wells in microplates was prepared as follows. One gram of soil was weighed and transferred to a conical flask holding 99 cm 3 sterile 0.9% NaCl, vortexed for 30 min at 150 rpm and at 25 • C, after which the samples were cooled for 30 min to 4 • C. After that, 120 mm 3 was transferred to each well of an EcoPlate and incubated in the dark at 25 • C for 216 h. The experiment included three replications. The results were read on a MicroStation ID system (Biolog Inc., Hayward, CA, United States). The extent to which carbon sources were used was determined through the reduction of colorless tetrazolium chloride (TTC) to red triphenyl-formasane (TPF) (Islam et al., 2011;Gałązka et al., 2017). Intensity of color development was recorded at λ = 590 nm for a period of 2016 h at 24-h intervals. The most intensive metabolism of carbon substrates was observed after 72-120 h of incubation. The activities of soil microorganisms are based on all carbon sources and on grouped sources defined as amines and amides, amino acids, carbohydrate, carboxylic acid and polymers (Rutgers et al., 2016). The results were expressed as Average Well-Color Development (AWCD) and the Shannon-Wiener (H ), Simpson (D), Richness (R), Evenness (E) indices.

DNA Extraction, NGS and Bioinformatics
The FastDNA SPIN Kit for Soil (MP Biomedicals, Solon, OH, United States) was used to extract total DNA from 0.5 g of soil.
The concentration and quality of the extracted DNA was determined using NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, United States).
Metagenomic analysis was conducted based on the hypervariable region V3-V4 of the 16S rRNA gene. Specific primers 341F and 785R were used for amplification of this region and library preparation. PCR reaction was conducted with the Q5 Hot Start High-Fidelity DNA Polymerase kit (NEB Inc., Ipswich, MA, United States) with reaction conditions according to manufacturer's specifications. Sequencing was conducted on a MiSeq sequencer in 2 × 250 bp paired-end (PE) technology using the v2 Illumina chemistry kit. The reactions were carried out according to the Illumina V3-V4 16S RNA amplification protocol (Illumina, San Diego, CA, United States) and sequencing was performed on an Illumina MiSeq PE300 (Genomed S.A., Warsaw, Poland). Automatic data analysis was performed on MiSeq and in Cloud environment BaseSpace by Illumina, using the 16S Metagenomics protocol (ver. 1.0.1). The libraries were prepared in an analogous way to the attached Illumina protocol.
From dereplicated fastq files to remove redundancy, amplicon sequence variants (ASVs) were extracted using the DADA2 version 1.8 package (Callahan et al., 2016) in R version 3.4.3 (R Core Team, 2016) with the following parameters: filterAndTrim was used, based on quality plots; the forward and reverse sequences were trimmed to 250 bp; and the first-left 20 bp were removed (containing primers and low-quality bases) from both direction reads. Filtering of sequences was set to: maxN = 0, maxEE = 5 (for both reads), truncQ = 2; where maxN was maximum number "N" bases, maxEE corresponded to maximum expected errors calculated from the quality score (EE = sum [10ˆ(−Q/10)] and truncQ parameter truncated reads at the first instance of a quality score ≤ 2. Other parameters were set to default. The error rates were estimated by learnErrors, where the nbases parameter was set to 10ˆ8. Sequences were dereplicated using derepFastq with default parameters and exact sequence variants were resolved using dada. Next removeBimeraDenovo was used to remove chimeric sequences, applying the consensus method. At this step, 0.047% sequences were identified as chimeric and removed.
Taxonomy was assigned against the latest version of RDP database. RDP taxonomic training data was formatted for DADA2 (RDP trainset 16/release 11.5), using Naïve Bayesian Classifier (Wang et al., 2007) implemented in assignTaxonomy, setting minBoot parameter to 50. Sequences classified as mitochondrial and chloroplasts were filtered out using the subset_taxa function in the phyloseq package (McMurdie and Holmes, 2013). The resulting taxa table was agglomerated accordingly to each taxa level using tax_glom and unclassified reads were retained for statistical purposes. Next, the taxa abundances were transformed into percentages.
The basic sequence information was deposited in the Sequence Read Archive (SRA), NCBI ( Table 1).
The software package Statistica 10.0 (Statsoft Inc., United States) were used to statistical analyses performed. Collected data was subjected to analysis of variance (ANOVA) for the comparison of means. In addition, significant differences were calculated according to Tukey's HSD post hoc test at P < 0.05 significant levels. The AWCD was evaluated according to Garland and Mills (1991) in accordance with formula AWCD = (C−R)/95; where C was the absorbency in each well, and R was the absorbency in the control well. The Shannon-Wiener (H ) index was evaluated in accordance with formula H = − pi(lnpi), pi was the ratio of the absorbance of each well to the absorbency of all wells (Gomez et al., 2006). The Simpson (D) index was evaluated in accordance with the formula: D = 1−( n (n−1)/N(N−1), where, n = number of individuals of each species, N = total number of individuals of all species. The Tukey's range test was used to identify homogeneous groups at the significance level P = 0.01. The obtained results were also submitted to principal component analysis (PCA) in order to determine the common relations between the bacterial core metagenome and soils collected from different oil wells. Permutational multivariate analysis of variance (PERMANOVA) was used to compare the bacterial diversity between soils taken from different sites. This was performed with 999 permutations using the Adonis function of the PAST package (v 3.16).

RESULTS
Evaluation of functional and structural diversity of bacteria in soil contaminated with petroleum long-term was based on two methods, as a means for establishing parameters for determining soil quality: the NGS technique (V3-V4 16S rRNA gene region), and the CLPP method.

Chemical Analysis of Soil Samples
Soils contaminated with petroleum were characterized by pH in the range from 4.75 (OWP3) to 5.54 (OW3; Table 2). The organic carbon (C org ) content was in the range from 3.07% (OWP1) to 6.07% (OW1). The soil taken from oil well no 1 (OWP1) was characterized by the highest content of PAHs ( 16 PAHs = 3.062 mg·kg −1 ; Table 2). The 16 PAH's in other soil samples ranged from 1.885 mg·kg −1 (OW3) to 2.938 mg·kg −1 (OW1). The samples taken directly from oil The results are shown as the mean of three repetitions (n = 3). OWP, soil sample taken within a radius of 0.5 m; OW, soil sample taken at a distance of 3 m from the oil well. Treatment means separated by different letters are significantly differ (Tukey's mean separation test, P < 0.05).
wells were characterized by higher contents of PAHs compared to the samples taken from 3 m distance (except for sample OW1). Also these soils were characterized by a higher content of trace elements ( Table 3). The content of individual metals in the soils varied and depended on the collection site and the distance from the oil well. The highest content of Mn (896.1 mg·kg −1 ) was observed in soil sample OW1. Similar results were observed in the case of sodium (Na) and calcium (Ca) ( Table 3).

Functional Diversity of Soils Assessed by CLPP
The soils samples collected directly from oil wells were characterized by higher values on CLPP (Figure 2). Heatmaps for the carbon utilization patterns of the substrates located on the Biolog EcoPlates, incubated for 120 h, showed significant differences between soil samples. The highest activity in carbon utilization patterns were observed in soils taken 3 m from the oil wells: OW1, OW2, and OW3 (Figure 2). This result might indicate that during the long-term contamination the autochthonic microorganisms adapted to live in this environment and were able to use PAHs as their only source of carbon and energy. Notably, the highest diversities based on Shannon-Wiener indexes were obtained in soils taken directly from oil wells: OWP1 (H = 3.342), OWP2 (H = 3.228), OWP3 (H = 3.282; Table 4). A higher richness value (R = 30) and average well-color development (AWCD = 1.553) were observed in soil OWP3. In addition, the lowest value of substrate evenness (E = 0.965) and Simpson index of diversity (D = 0.960) were found in this same soil sample (OWP3). All five main groups of carbon source (carbohydrates, polymers, carboxylic and acetic acids, amino acids, amines and amides) were efficiently used by microorganisms. Amino acids, carbohydrates, carboxylic and acetic acids were groups of compounds which were used by microorganisms much better than the other two groups, polymers and amines (Figure 3). Selected indicators of soil microbial diversity (PAH content and Biolog indexes) explained 87.46% of biological variability in soils. Biodiversity indicators obtained from Biolog EcoPlates were strongly correlated with C org , pH, 16 PAHs and aliphatics (Figure 4). This might prove that these hydrocarbons were being degraded by bacteria in the soil of this environment. Based on PCA, 2 different groups of soils were obtained: group I (OW1, OW2, and OW3), and group II (OWP1, OWP2, and OWP3; Figure 4). These results confirmed the strong functional bacterial diversity depending on where the sample was collected. Positive correlations were shown between the first and second components of PCA (PCA1 = 70.90%, PCA2 = 16.56%, respectively) and carbon source in the 120 h Biolog EcoPlate incubated soils (Table 5). The results are shown as the mean of three repetitions (n = 3). OWP, soil sample taken within a radius of 0.5 m; OW, soil sample taken at a distance of 3 m from the oil well. Treatment means separated by different letters are significantly differ (Tukey's mean separation test, P < 0.05).  Changes of functional diversities of bacterial communities in soils long-term contaminated with crude oil, as evaluated by the Shannon-Wiener general diversity index (H ); Simpson index of diversity (D); substrate richness (R); substrate evenness (E) and average well-color development (AWCD 590 ). Data obtained from the Biolog EcoPlates incubated for 48 h. OWP, soil sample taken within a radius of 0.5 m; OW, soil sample taken at a distance of 3 m from the oil well. Treatment means separated by different letters are significantly differ (Tukey's mean separation test, P < 0.05).

Bacterial Genetic Diversity of Soils
There were differences between the metabolic profiles (CLPPs) of the analyzed soils in microbial communities demonstrated by Biolog EcoPlates and the metagenomics approach based on the V3-V4 16S rRNA gene region using NSG. Significant differences in bacterial structure were found between soils taken at different distances from the oil wells. Significant differences were found by PERMANOVA at the family level (P = 0.029, F = 3.201), genus level (P = 0.027, F = 2.922, and species level (P = 0.034, F = 2.563). Values marked with ( * ) are statistically significant (P < 0.05), n = 3.
The Venn diagram for the bacterial communities of the three oil wells based on data from NGS is presented on Figure 5. In the case of soils collected directly from oil wells (OWP1, OWP2, and OWP3), 62% of the bacterial microbiome were found to be in common. By contrast, soils collected from a distance of 3 m (OW1, OW2, and OW3) had only 38% in common. This might be explained by the development of completely different groups of bacteria in contaminated soils. Analyzing the distance between soil samples, the overlapping bacteria for oil well no 1 (OW1) was 41%, for OW2 61%, and for oil well no 3 (OW3) 58% (Figure 5).
A strong correlation between the phyla in different soil samples was observed ( Figure 6A). These analyses explained 71.6% of the biodiversity in the soils. Based on bacterial phyla as a component of PCA 3 different groups of soils were obtained: group I (OW1 and OW3), group II (OW2) and group III (OWP1, OWP2, and OWP3). Sequences from the V3-V4 16S rRNA gene region, assigned to the reference database, are presented in Figure 6B. All the main phyla of bacteria were observed: Acidobacteria, Proteobacteria, Actinobacteria, Nitrospirae, Bacteroidetes, Chloroflexi, Gemmatimonadetes, Cyanobacteria, Firmicutes, Planctomycetes, and Verrucomicrobia (Figure 6). Actinobacteria was a dominant phylum in soil taken 3 m from oil well OW1 (45.82%). Acidobacteria was the dominant phylum in soils at the same distance from OW2 (19.71%). Proteobacteria was the dominant phylum in soil taken directly from oil wells (OWP2, 43.80%; OWP1, 42.42%; OWP3, 39.77%; Figure 6C).
Significant differences in the main family of Proteobacteria were observed in contaminated soils depending on where they were collected (Figure 9). Some family members belonging to Alphaproteobacteria were dominant in soils taken directly from oil wells: Mycobacteriacea, Methylococcaceae, Bradyrhizobiaceae, Rhizobiaceae, Rhodobacteraceae, Acetobacteraceae, Hyphomicrobiaceae, and Sphingomonadaceae. The Streptophyta and the Gp6 family were also dominant at this location ( Figure 9); however, other strains were abundant in soils taken 3 m from the oil wells.
Valuable information could also be obtained from analysis of bacterial structure at the genus level. However, among the analyzed sequences there are many that were unidentified. There were statistically significant differences in the bacterial structure on the genus level between soils collected directly from the oil well and those collected at a distance of 3 m (Figure 10). On the other hand, the analysis of bacterial composition in soil contaminated with crude oil at the species level did not produce good results. Further studies at this level will need to be carried out by sequencing longer DNA fragments. Even so, in the case of species-level analysis, most of the sequenced species are noncultivated bacteria that have not been classified (Figure 10).

DISCUSSION
Polycyclic aromatic hydrocarbons and other petroleum derivatives have a high potential to accumulate in the soil environment, where they can interfere with the soil's microbiome (Gałązka and Grządziel, 2016). Bacteria that biological degrade petroleum derivatives often show synergistic effects and are one of the most effective and secure ways of removing hydrocarbons and PAHs from the environment, though the process is lengthy and multistage (Semple et al., 2006). The knowledge of the nature and diversity of bacterial communities in oil fields is still scarce, and their metabolic activities in situ largely unknown. There is a great need for research directly at sites of contamination in order to be able to identify and characterize bacteria, both genetically and functionally. Clearly, such studies can benefit from using a combination of molecular-based techniques and functional analysis (Li et al., 2008).
Petroleum compounds are mainly composed of carbon and hydrogen and only a small number of them contain nitrogen (Vazquez et al., 2013;Silva et al., 2015). This causes the C:N ratio to increase significantly in contaminated soil (Sutton et al., 2013). Determining the synergistic interactions, and taking into account the optimum plant-microsymbiont systems, could be the beginning of work on the development of effective and ecologically sound cometabolic systems for treating soils contaminated with petroleum derivatives and trace elements (Nwaichi et al., 2015). de Oliveira et al. (2008) showed statistically differences in the dominant bacterial taxones, in oil samples originating from three reservoirs presenting with distinct levels of biodegradation. The dominant bacterial species were Rhodococcus sp., Acidithiobacillus ferrooxidans, Alicyclobacillus acidoterrestris, Bacillus spp., and Streptomyces sp.; as well as finding Halanaerobium sp. and Pseudomonas sp.
Our studies clearly proved that long-term contamination of soil also induces changes in the bacterial community structure and their metabolic activity, with the emergence of different groups of bacteria. The Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria were strongly correlated with biological activity in soils taken directly from oil wells (OWP1, OWP2, and OWP3). The Deltaproteobacteria was dominated in soil OW2. Some specific families of Alphaproteobacteria were dominant in soil taken directly from oil wells: Bradyrhizobiaceae, Rhizobiaceae, Rhodobacteraceae, Acetobacteraceae, Hyphomicrobiaceae, and Sphingomonadaceae. By contrast other families were found in soil taken 3 m from oil wells. The highest biodiversities based on Shannon-Weaver indexes were obtained in soils taken directly from oil wells.
Model microorganisms used in the bioremediation process include Mycobacterium sp. of genus Mycobacterium and Pseudomonas. Bacteria of the genus Mycobacterium strain PYR-1 decompose all 3-, 4-, and 5-ring PAHs, except for chrysene (Child et al., 2007). In soils contaminated with anthracene, phenanthrene, and pyrene, the microorganisms degrading PAHs are mostly of the genus Pseudomonas (Doong and Lei, 2003), including P. fluorescens, P. putida, and P. paucimobilis. Also among these is P. stutzeri, which binds free nitrogen in the presence of various substrates which are its source of carbon and energy.
The isolation and characterization of new species and strains capable of using PAHs as the sole carbon and energy sources, has focused on their isolation from contaminated soils, and has included: Mycobacterium pallens sp. nov., M. crocinum sp. nov., M. rutilum sp. nov., M. rufum sp. nov., and M. aromaticivorans sp. nov. These strains were described as new species of bacteria (nov.) capable of degrading PAHs (Zhong et al., 2011). In recent times, electron acceptors other than oxygen have been introduced into the environment, stimulating anaerobic processes of decomposition of organic pollutants. Under these conditions, microorganisms can digest most aliphatic and aromatic hydrocarbons. They are completely or partially degraded by bacteria that are: denitrifying, sulfate-reducing, iron or molybdenum nitrogen-reducing, or methanogenic (Chen et al., 2005). Bacteria from genera such as Azoarcus (A. toluvorans, A. toluclasticus, A. evansii) Paracoccus, Ochrobactrum, Thaurea, Burkholderia kururiensis, Bradyrhizobium, and Mezorhizobium can decompose various benzene compounds during denitrification. The processes of biodegradation and detoxification of phenols and transdihydrodiols can occur simultaneously with the transformation of sulfates, glucuronic acid, glucose or xylose. Some strains of the genus Pseudomonas (strain T and K172) can oxidize toluene and m-xylene under anaerobic conditions. In our study among the dominating classes found were: Acidobacteria, Solibacteres, Chloracidobacteria, Acidimicrobiia, Actinobacteria, Thermoleophilia, Cytophagia, Bacilli, Nitrospira, Planctomycetia, Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria, Gammaproteobacteria, and Spartobacteria. The presence of these bacteria in the contaminated soil depended significantly on both the contamination site and the petroleum derivatives found in the soils. These bacteria were very sensitive indicators of soil quality in contaminated soils.
The bacterial composition in the soil collected from the oldest oil wells in Węglówka was not typical for soils contaminated with petroleum. Soil structure is the main determinant of genus composition; however, for this study all the soils belonged to one type. Long-term contamination has resulted in adaptation to the conditions of contamination by select groups and types of bacteria. Notably, the presence of such species as Rhizobium leguminosarum and Azospirillum brasilense confirmed the high bioremediation activity of these soils. Azospirillum spp. has been shown to be involved in bioremediation of soils artificially contaminated with PAHs (Gałązka et al., 2012;Gałązka and Gałązka, 2015).
Selection of bacteria capable of bioremediating PAHs could be extremely useful in determining the role of specific microorganisms in soil. It is essential to understand the functional and structural diversity of bacterial at a contaminated site. Only with this knowledge can one contribute to the proper management and cleaning of such areas.

CONCLUSION
(1) The results of our study indicated significant differences in both genetic and catabolic bacterial diversity in soils long-term contaminated with petroleum. The soil samples collected directly from oil well were characterized by higher biodiversity. These soils were also characterized by a different structural diversity and some different groups of bacteria compared to the soils taken at a distance of 3 m from the oil wells.
(2) The use of two research methods (the bacterial NGS and CLPP techniques) contributed to a better understanding of the functional and structural diversity of bacteria in contaminated soils. (3) The content of different trace elements and PAHs in the soil samples contributed to the differentiation of the bacterial composition in the soils. Individual trace elements could be involved in the activation of several enzymes which could aid cometabolic degradation of PAHs. This might indicate that during long-term contamination autochthonic microorganisms have adapted to live in this environment and were able to use PAHs as their only source of carbon and energy. (4) Determining the bacterial structure and function in contaminated soil is the basis for further studies to identify active bacteria strains in bioremediation. Future studies are needed to select bacterial strains, particularly active in PAH degradation, which could be used in active management of bioremediation processes. (5) Some family members belonging to Alphaproteobacteria were dominant in soils taken directly from oil wells: Mycobacteriacea, Methylococcaceae, Bradyrhizobiaceae, Rhizobiaceae, Rhodobacteraceae, Acetobacteraceae, Hyphomicrobiaceae, and Sphingomonadaceae. This family of bacteria they can be a good indicator of soils petroleum contaminated.