Geochemistry and Microbiology Predict Environmental Niches With Conditions Favoring Potential Microbial Activity in the Bakken Shale

The Bakken Shale and underlying Three Forks Formation is an important oil and gas reservoir in the United States. The hydrocarbon resources in this region are accessible using unconventional oil and gas extraction methods, including horizontal drilling and hydraulic fracturing. However, the geochemistry and microbiology of this region are not well understood, although they are known to have major implications for productivity and water management. In this study, we analyzed the produced water from 14 unconventional wells in the Bakken Shale using geochemical measurements, quantitative PCR (qPCR), and 16S rRNA gene sequencing with the overall goal of understanding the complex dynamics present in hydraulically fractured wells. Bakken Shale produced waters from this study exhibit high measurements of total dissolved solids (TDS). These conditions inhibit microbial growth, such that all samples had low microbial loads except for one sample (well 11), which had lower TDS concentrations and higher 16S rRNA gene copies. Our produced water samples had elevated chloride concentrations typical of other Bakken waters. However, they also contained a sulfate concentration trend that suggested higher occurrence of sulfate reduction, especially in wells 11 and 18. The unique geochemistry and microbial loads recorded for wells 11 and 18 suggest that the heterogeneous nature of the producing formation can provide environmental niches with conditions conducive for microbial growth. This was supported by strong correlations between the produced water microbial community and the associated geochemical parameters including sodium, chloride, and sulfate concentrations. The produced water microbial community was dominated by 19 bacterial families, all of which have previously been associated with hydrocarbon-reservoirs. These families include Halanaerobiaceae, Pseudomonadaceae, and Desulfohalobiaceae which are often associated with thiosulfate reduction, biofilm production, and sulfate reduction, respectively. Notably, well 11 was dominated by sulfate reducers. Our findings expand the current understanding of microbial life in the Bakken region and provide new insights into how the unique produced water conditions shape microbial communities. Finally, our analysis suggests that produced water chemistry is tightly linked with microbiota in the Bakken Shale and shows that additional research efforts that incorporate coupled microbial and geochemical datasets are necessary to understand this ecosystem.

The Bakken Shale and underlying Three Forks Formation is an important oil and gas reservoir in the United States. The hydrocarbon resources in this region are accessible using unconventional oil and gas extraction methods, including horizontal drilling and hydraulic fracturing. However, the geochemistry and microbiology of this region are not well understood, although they are known to have major implications for productivity and water management. In this study, we analyzed the produced water from 14 unconventional wells in the Bakken Shale using geochemical measurements, quantitative PCR (qPCR), and 16S rRNA gene sequencing with the overall goal of understanding the complex dynamics present in hydraulically fractured wells. Bakken Shale produced waters from this study exhibit high measurements of total dissolved solids (TDS). These conditions inhibit microbial growth, such that all samples had low microbial loads except for one sample (well 11), which had lower TDS concentrations and higher 16S rRNA gene copies. Our produced water samples had elevated chloride concentrations typical of other Bakken waters. However, they also contained a sulfate concentration trend that suggested higher occurrence of sulfate reduction, especially in wells 11 and 18. The unique geochemistry and microbial loads recorded for wells 11 and 18 suggest that the heterogeneous nature of the producing formation can provide environmental niches with conditions conducive for microbial growth. This was supported by strong correlations between the produced water microbial community and the associated geochemical parameters including sodium, chloride, and sulfate concentrations. The produced water microbial community was dominated by 19 bacterial families, all of which have previously been associated with hydrocarbon-reservoirs. These families include Halanaerobiaceae, Pseudomonadaceae, and Desulfohalobiaceae which are often associated with thiosulfate reduction, biofilm production, and sulfate reduction, respectively. Notably, well 11 was dominated by sulfate reducers. Our findings expand the current understanding of microbial life in the Bakken region and provide new insights

INTRODUCTION
Hydrocarbon resources represent an important global energy source, currently providing over 75% of primary energy in the United States (U.S. Department of Energy et al., 2020b), and also play an essential role in other countries including Canada and the United Kingdom (Stevens, 2013;Elliott et al., 2014;Rivard et al., 2014). The majority of fossil fuels are currently extracted from unconventional wells using advanced technologies, including horizontal drilling and hydraulic fracturing (U.S. Department of Energy et al., 2020a). Horizontal drilling provides access to low-permeability target formations that were previously inaccessible using conventional, vertical drilling methods. This has resulted in rapidly increased amounts of produced oil and gas from new and established plays across the last decade (U.S. Department of Energy et al., 2020b). Hydraulic fracturing occurs when large volumes of water (5,000-20,000 m 3 ) are injected into the subsurface through the horizontally drilled well, increasing the subsurface pressure and fracturing the target formation to release hydrocarbons from the shale (Arthur et al., 2008;Gregory et al., 2011).
Tight oil reservoirs such as the Bakken Formation and underlying Three Forks Formation in North Dakota are one of the impermeable shale plays now accessible through horizontal drilling and hydraulic fracturing. However, these unconventional extraction techniques have introduced new water management and infrastructure challenges raising various operational, economic, and environmental issues for unconventional drilling (Gregory et al., 2011;Orangi et al., 2011). Hydraulically fractured wells generate billions of gallons of produced water each year (Horner et al., 2016) which contain high concentrations of salt (Gregory et al., 2011;Barbot et al., 2013;Murali Mohan et al., 2013;Cluff et al., 2014;Lipus et al., 2018;Wang et al., 2019), metals (Gregory et al., 2011;Barbot et al., 2013), and organics (Strong et al., 2014;Akyon et al., 2019) making management, handling, and disposal of these produced waters difficult and expensive. Another major challenge is the presence of various kinds of microorganisms, which may contribute to the corrosion of well-components, including casing and pipes, and reservoir souring Mohan et al., 2014;Stringfellow et al., 2014;Daly et al., 2016;Torres et al., 2016;Lipus et al., 2017Lipus et al., , 2018. Consequently, microbial activities in hydraulic fracturing may result in operational interruptions, negatively impact hydrocarbon recovery, and create safety and environmental concerns.
Understanding the geochemical and microbial characteristics of produced water from hydraulically fractured wells is essential for effective produced water management and microbial growth control. The Bakken Shale is currently the second most productive oil region in the United States (U.S. Department of Energy et al., 2020c) and contains reserves of 7.4 billions barrels of oil and 190 billion m 3 of natural gas (Gaswirth, 2013;Plummer et al., 2013), therefore it presents an important study and research target in this field. Previous geochemistry-focused studies report that Bakken Shale produced water is characterized by high total dissolved solids (TDS) (150,000-350,000 mg/L) compared to other active regions (Stepan et al., 2010;Kurz et al., 2016;Thyne and Brady, 2016;Lipus et al., 2018) which have TDS concentrations as low as 680 mg/L ( Barbot et al., 2013). Overall, less than 30 studies have examined the microbial community in hydraulically fractured wells, contributing to the identification of novel and shale-specific organisms. These studies demonstrate that produced water is dominated by halophilic, anaerobic, and aerobic groups of microbes, revealing unique metabolic survival strategies (Daly et al., 2016;Liang et al., 2016;Lipus et al., 2017). Only four of these studies have contained samples from the Bakken Shale, with a total of 25 wells represented from this region (Strong et al., 2014;An et al., 2017;Lipus et al., 2018;Wang et al., 2019). The results from these studies are variable. However, previous work suggests shale-associated waters have unique geochemical-and microbial-profiles impacted by factors including season, geographic region, biocidal treatments, well age, and the shale's post depositional history (Davis et al., 2012;Vikram et al., 2014;Akob et al., 2015;Hull et al., 2018;Lipus et al., 2018;Wang et al., 2019). Therefore, the discrepancy across the three Bakken Shale-focused studies is both unsurprising and highlights the need for additional work that incorporates coupled microbial and geochemical datasets within this region.
Our overall goal is to expand on existing knowledge of the geochemistry and microbiology in hydraulically fractured wells, which will increase insight into the complex geochemical and microbiological dynamics present in these systems. In this study, we analyzed the produced water from 14 unconventional wells in the Bakken and Three Forks Formation. We measured TDS, dissolved organic carbon (DOC), pH, and geochemical data for each produced water sample, and compared the geochemical trends with Bakken data available from the U.S. Geological Survey Produced Water Database (USGS PWDB) (Blondes et al., 2019). Quantitative PCR (qPCR) and 16S rRNA gene amplicon sequencing were used to estimate microbial loads and characterize the microbial community of produced water samples, when possible.
Finally, we measured correlations between geochemical and microbial data in order to identify which factors shape the Bakken-associated microbiome. These results provide additional knowledge about the Bakken region, provide valuable context for previous Bakken Shale microbiome studies, and will ultimately be used to improve waste water management and microbial growth control practices in the oil and gas industry.

Sampling
Produced water samples were collected from 14 hydraulically fractured wells in North Dakota actively producing oil from the Bakken Formation or the underlying Three Forks Formations at the time of sampling in May 2018 (Figure 1). As production data from these formations is typically reported together, we refer to the sampled region as the Bakken Shale or Bakken region throughout this manuscript. Each produced water sample was obtained from the associated three-phase separator, which separates hydrocarbons from the remaining fluids. Separators are often the closest available sampling point to the wellhead and have been used in previous studies (Cluff et al., 2014;Mohan et al., 2014;Lipus et al., 2017). Samples were collected in unused plastic carboys that were pre-rinsed with sample waters and allowed between 1 and 3 h to settle into distinguishable oil and water phases in the sealed container. A portion of the water phase was also collected in a sterile 1 L nalgene bottle and immediately placed on dry ice until arrival in the lab for microbiology analysis. Dry ice was replenished as needed, such that samples remained frozen until they arrived in the lab. Upon arrival in the lab, samples were stored according to previously described guidelines at −20 • C until microbial processing (Lipus et al., 2017(Lipus et al., , 2018(Lipus et al., , 2019. The remaining water phases were then passed through glass wool cartridges under gravity and then a 0.45 µm inline filter (EnviroTech, Salt Lake City, UT, United States), minimizing oxygen exposure in a closed loop. This filtered water was collected and measured for pH using a Horiba multimeter (Horiba, Edison, NJ, United States) and titrated for alkalinity using a Hach digital titrator (Hach, Loveland, CO, United States). Samples were collected with zero-headspace in Shimadzu TOC vials and transported on ice for total organic carbon (TOC) analysis. A 30 mL volume of the filtered water was acidified to 2% by volume using nitric acid and transported on ice for inductively coupled plasma optical emission spectroscopy (ICP-OES). Finally, a 15 mL volume of the filtered water was passed through a second 0.22 µm sterile polyethersulfone filter (Millipore, Inc.) and shipped on ice for ion chromatography (IC) measurement.

Chemical Analysis
Major cations and anions were detected using ion chromatography (IC) on a ThermoFisher (Thermofisher, Waltham, MA, United States) ICS-5000+ with AS11-HC column for anion quantification and CS16 column for cation quantification. All samples were run in triplicate and the standard error of IC measurements reported here was less than 3%. Additionally, every 10-20 samples, a cation/anion control sample (Sigma Aldrich, St Louis, MO, United States) with known certified concentrations was added during measurements and all control samples demonstrated an accuracy within 95-105%.
Trace metals were analyzed using inductively coupled plasma optical emission spectroscopy (ICP-OES) on an Optima 7300 DV (Perkin Elmer, Waltham, MA, United States), which is a dual-view spectrometer with solid state SCD detectors. U.S. EPA Method 6010D was employed for analysis with one duplicate, one standard recovery, and one spike recovery. Sample solutions were nebulized using a glass Seaspray concentric nebulizer and a glass Baffled Cyclonic spray chamber (Glass Expansion, Pocasset, MA, United States) using 5 ppm yttrium as an online internal standard. Calibration standards were purchased from Inorganic Ventures (Christiansburg, VA, United States) and are traceable to NIST standard reference materials.

DNA Extraction
We extracted DNA from the samples using a modified version of the DNeasy Powersoil kit (Qiagen, Hilden, Germany). Produced water samples were filtered through a 0.2 micron polyethersulfone membrane filter (Pall Corporation, Port Washington, NY, United States). The filter with the collected biomass was then transferred to a 1.5 mL microcentrifuge tube incubated with 1 mL TE buffer and 10 µl of 20 mg/mL lysozyme (Sigma Aldrich) at 37 • C for 30 min. Samples were exposed to 10 min of bead beating on a vortexer followed by extraction using the recommended guidelines in the manufacturer's protocol. DNA was eluted from spin filters in 100 µL nuclease free water. Four kit blanks were also concurrently extracted to confirm that no contamination occurred.

Quantitative PCR
The bacterial abundance in produced water samples was measured using qPCR with 16S rRNA gene primers (F: GTGSTGCAYGGYTGTCGTCA; R: ACGTCRTCCMCACCTTCCTC) designed by Maeda et al. (2003) with an expected amplicon size of 146 base pairs. qPCR reactions were run in triplicate on a Magnetic Induction Cycler (MIC) (Bio Molecular Systems, Upper Coomera, Australia). Each reaction contained 2X SensiFAST SYBR No-Rox master mix (Bioline, London, United Kingdom), 400 nM forward primer, 400 nM reverse primer, and 1 µL of template DNA for a total reaction of 20 µL. qPCR conditions consisted of a polymerase activation step at 95 • C for 2 min followed by 40 amplification cycles each consisting of: denaturation at 95 • C for 5 s, annealing at 62 • C for 5 s, and an extension step at 72 • C for 1 s. Standard curves were generated using gBlocks Gene Fragments (Integrated DNA Technologies, Coralville, IA, FIGURE 1 | Map of wellhead sampling locations; wellhead location does not indicate the relative size or location of the oil reservoir. Well 18 is located in Dunn County and is not featured on the inset map. Information about well characteristic, including well depth, are found in Table 1. Figure 1) and negative control samples were included in each amplification assay.

16S rRNA Library Preparation and Sequencing
Extracted DNA and kit blanks were amplified using universal primers targeting the V4 region of the 16S rRNA gene, as previously described (Caporaso et al., 2011(Caporaso et al., , 2012. In order to obtain enough PCR product, we pooled 8 technical replicates per sample. The pooled PCR products were cleaned using AMPure or SPRIselect beads (Beckman Coulter, Pasadena, CA), quantified using the Qubit dsDNA High Sensitivity Assay Kit (Life Technologies, Carlsbad, CA, United States), and visualized on a Bioanalyzer using the High Sensitivity DNA Kit (Agilent, Santa Clara, CA, United States). Negative PCR controls were also amplified, quantified, and visualized in order to confirm that no contamination occurred. In total, we attempted to amplify 4 kit extraction blanks and 6 negative PCR controls. After quantification and visualization, only one amplified kit extraction blank and one negative PCR control contained amplicons. These samples, identified as Blank 1 and 2 in the Supplementary Data 1, were sequenced with our experimental 16S rRNA libraries. Purified 16S rRNA libraries were pooled, diluted to a concentration of 2 nM, and denatured using fresh 0.2 M NaOH. Libraries were further diluted according to the manufacturer's instructions and sequenced on an Illumina Miseq (Illumina, San Diego, CA, United States) using a 300 cycle V2 Nano kit.

16S rRNA Data Analysis
16S rRNA gene sequences were analyzed using QIIME2 version 2019.10 (Bolyen et al., 2019). Sequences were imported as EMPSingleEndSequences and demultiplexed using the demux emp-single command. DADA2 (Callahan et al., 2016) was used to filter, denoise, and remove chimeras from the demultiplexed sequencing data. For this command, we utilized default settings and set the truncation length to 250 base pairs. The classifysklearn (Pedregosa et al., 2011) command was used to classify representative sequences identified through DADA2 using a pretrained Naive Bayes classifier trained on Silva 132 99% OTUs (Quast et al., 2012;Yilmaz et al., 2014) from the 515F/806R region and any sequences identified as chloroplast or mitochondria were filtered from the data.
Data generated by Qiime2 was imported into R for further analysis (R Core Team, 2013). We utilized the vegan package (Oksanen et al., 2012) to calculate the Chao1 index, Shannon index, and Bray-Curtis dissimilarity values. We also used the vegan package (Oksanen et al., 2012) to complete nonmetric multidimensional scaling (NMDS), calculate Analysis of Similarities (ANOSIM), complete the Mantel test, and to fit environmental parameters onto the generated NMDS ordination plot. All NMDS plots were constructed with k = 3 dimensions. NMDS, ANOSIM, and the Mantel test were completed using Bray-Curtis distance measurements calculated after sequence libraries were resampled to the depth of the sample with the fewest sequences from this experiment (1942 sequences) (Supplementary Table 1). The Mantel test also relied on the Euclidean distance of the geochemical data, which was calculated in Base R. The significance of the ANOSIM analysis, the Mantel test, and the environmental vectors was based on 999 permutations of the grouping, geochemical, and environmental data, respectively. Finally, we also used the RColorBrewer package to generate colorblind accessible palettes when necessary (Neuwirth, 2014).

Bakken Shale Exhibits High TDS Primarily Comprised of NaCl and a Localized Sulfate Trend
Produced water samples were obtained from 14 wells in the Bakken region (Figure 1). The pH was slightly below circumneutral with a range of 6.1-6.7 (Table 1), which is similar to previously reported results (Wang et al., 2019). The exception was well 6, which had a pH of 5.8 (Table 1). Alkalinity varied from low to high, with the majority of wells ranging from 186 to 740 mg/L as CaCO 3 . Exceptions were wells 5, 6, 7, and 10, which ranged from 1440 to 2000 mg/L as CaCO 3 ( Table 1). Although alkalinity of 2000 mg/L and higher have been reported in this region (Blondes et al., 2019;Chang et al., 2019), the variation is notable, as previous work involving these wells observed only low alkalinity measurements (Lipus et al., 2018). TOC was measured in the form of non-purgeable organic carbon (NPOC), which ranged from 63 to 543 mg as C/L ( Table 1), and is similar to previously reported DOC values for the Bakken region (Wang et al., 2019).
The TDS concentrations ranged between 135,000 mg/L and 357,000 mg/L ( Table 1) and were characterized by 86.1% wt to 88.6% wt Na + and Cl − ( Table 2). These TDS concentrations are similar to those previously reported within the Bakken Shale (Thyne and Brady, 2016;Lipus et al., 2018;Wang et al., 2019) and reinforce that produced water from the Bakken region has higher TDS concentrations than samples collected from other shale plays, including the Marcellus and Barnett (Akob et al., 2015;Lipus et al., 2017;Wang et al., 2019). These high salinities are likely derived from post-depositional halite and anhydrite dissolution processes (Bachu and Hitchon, 1996;Iampen and Rostron, 2000). TDS measurements in conjunction with major cation and anion concentrations can either reflect the formation brine (Iampen and Rostron, 2000;Engle et al., 2016) or fluid mixing processes related to hydraulic fracturing at unconventional wells (Haluszczak et al., 2013;Vengosh et al., 2017). Samples in this study display a positive linear association (R 2 = 0.9892) between TDS and chloride (Figure 2A), a trend that is seen in other Bakken produced waters (Thyne and Brady, 2016) and is expected since chloride is the primary anion in this study's samples. This trendline includes well 11, suggesting that this water's TDS and chloride concentrations are still compositionally similar to other samples in this study. This is notable, as produced water from well 11 exhibited a lower TDS concentration (135,000 mg/L) compared to the other samples, which had an average of 287,000 mg/L ( Table 1). TDS levels from well 11 also fall within the range of recorded Bakken produced water measurements when data reported in the USGS PWDB were plotted along with samples from this study (Blondes et al., 2019) (Supplementary Figure 2A). The low TDS of well 11 may be due to its spatial heterogeneities within the reservoir; Bakken produced waters have high spatial TDS variability, as there is minimal fluid flow in certain regions due to the formation's aquitard characteristics (Bachu and Hitchon, 1996;Hitchon, 1996).
Sulfate is a typical secondary anion in produced waters and increases in sulfate concentration can reflect drilling activity, either due to injected chemicals associated with hydraulic  All measurements are reported in mg/L and were taken using IC unless otherwise noted. # Measured using ICP-OES.
fracturing operations (Paukert Vankeuren et al., 2017) or due to injected water reacting with sulfur and sulfate bearing minerals (Osselin et al., 2019). Although anhydrite content is negligible, trace amounts of pyrite have been observed in the Bakken Formation (Pitman et al., 2001;Ashu, 2014). When TDS is plotted against sulfate and the youngest and most spatially unique sample (well 18) is excluded there is a negative linear relationship (R 2 = 0.853) ( Figure 2B). Although most wells were hydraulically fractured more than 4 years ago (Table 1), well 18 is 1 month removed from a completed hydraulic fracturing procedure. Therefore, the elevated sulfate concentration in well 18 is either the result of spatial heterogeneity or recent hydraulic fracturing. When TDS and sulfate for other Bakken produced waters are plotted (Supplementary Figure 2B), there is no clear linear association (R 2 = 0.0811) between the parameters. This suggests samples from this study are exhibiting a localized trend. Despite this, the young age of well 18 makes it difficult to conclude whether it falls off the linear trendline ( Figure 2B) due to its location or the recent hydraulic fracturing operation. Previous research has demonstrated that evaluating produced water chemistry using compositional data analysis can remove inaccurate correlations (Engle and Rowan, 2013;Engle and Blondes, 2014;Blondes et al., 2016). When they are applied to produced waters, compositional data analysis techniques include using isometric log-ratio transformations to understand whether concentration variations indicate water-rock or fluid mixing processes. Molar ratios of four analytes (calcium, sulfate, sodium, and chloride) were transformed using isometric log-ratios and were plotted (Figure 2C) for Bakken produced waters. The region near the origin represents equilibrium conditions with the evaporite minerals halite and sulfate. Samples from this study plotted within the range of USGS PWDB samples, but they plotted on the high end of the calcium-sulfate ratios ( Figure 2C). Previous work studying produced waters has suggested similarly elevated calcium-sulfate ratios may be the result of sulfate reduction . This suggests that sulfate reduction was more prevalent in samples from this study than other samples from the Bakken region.

Produced Waters Contain Low Microbial Loads With High Alpha Diversity
In order to determine the total microbial abundance in the evaluated Bakken region produced waters, qPCR analyses were conducted on the 14 collected samples. The median microbial load was found to be 1.15E+2 16S rRNA gene copies/mL of produced water. The microbial abundances across all produced water samples ranged from 2.15E+1 to 3.06E+5 16S rRNA gene copies/mL. The microbial load in well 11 was significantly higher than in the other samples and appeared to be an outlier (Table 1 and Supplementary Figure 3). However, it falls within the microbial loads recorded in the Marcellus Shale and Denver-Julesburg Basin, which range from 1.00E+5 to 2.10E+8 and 1.0E+3.1 to 1.0E+8.5, respectively (Lipus et al., 2017;Hull et al., 2018).
To assess the microbial community composition, we conducted 16S rRNA gene amplicon sequencing for 11 of the 14 produced water samples. No DNA could be amplified for the remaining three samples, potentially due to low overall biomass. Additionally, 10 blanks were prepared, two of which had measurable DNA and were therefore sequenced. 16S rRNA amplicon sequencing yielded a total of 675,581 raw sequences, of which 289,457 remained after quality control filtering (Supplementary Table 1). We also concurrently analyzed previously published sequences (Lipus et al., 2018) generated from samples collected at the same sites and within the same region using the same bioinformatic pipeline (Supplementary Table 1).
Alpha diversity was determined by counting the total number of ASVs in a sample and calculating the Shannon richness, Chao1 index, and Pielou's evenness metric for each site ( Table 3,  Supplementary Table 1, and Supplementary Figure 4). Each sample had an average of 58 unique ASVs per 1942 sequences. FIGURE 2 | (A) TDS and chloride measurements plotted for produced waters in this study. Linear trendlines report a high linear correlation value, displaying that chloride is the primary anion and well 11, with its lower TDS value, does not have an anomalous chloride value. The trendline excludes well 18, a sample that, based on its geochemistry and age, likely represents a mix of flowback and produced water. (B) TDS and sulfate measurements plotted for produced waters in this study. There is a higher linear association (R 2 = 0.8533) for these analytes when the youngest and spatially unique sample, well 18, is excluded. Based on measurements from this study and previous research, the elevated sulfate in well 18 could be related to residual chemicals from hydraulic fracturing operations or the product of flowback waters.   Figure 4). As a whole, our samples were significantly more rich, diverse, and even when compared to samples previously collected  Figure 4). Our samples also had a greater spread of Chao1 indices and a smaller spread of both Shannon indices and Pielou's evenness measurements (Supplementary Figure 4). The notable exception is that samples collected in January 2015 by Lipus et al. (2018) had comparable levels of evenness and a much smaller spread for all alpha diversity metrics (Supplementary Figure 4). One possible reason for this trend is that after several years of production, well conditions select for a microbial community with a high diversity of adapted microorganisms.

Hydrocarbon Associated Bacterial Families Dominate Microbial Community
The analyzed produced water samples were dominated by microbiota from 19 bacterial families, each of which represents ≥ 5% of sequences from any one sample. Twelve of the dominant families have been previously isolated and cultured from hydrocarbon reservoirs and 5 have been previously identified through sequencing hydrocarbonassociated environmental samples (Korenblum et al., 2012;Tatar, 2018). These dominant families comprise between 76.24% and 97.08% of the total microbial community in each well ( Figure 3A, Supplementary Table 3, and Supplementary Figure 5). The families Halanaerobiaceae, Methylomonaceae, Xanthobacteraceae, Burkholderiaceae, Desulfohalobiaceae, and Pseudomonadaceae were especially abundant, with an average relative abundance of 57.02% and a range between 4.43 and 83.24% (Figure 3A, Supplementary Table 2, and Supplementary Figure 5). The one exception is produced water from well 10, which was dominated by the Clostridiales Family XI and Shewanellaceae, representing 35.80 and 31.32%, respectively, of the total microbial community ( Figure 3A, Supplementary  Table 3, and Supplementary Figure 5). Shewanellaceae has previously been detected in saline environments, oil field wastes, and other hydrocarbon environments (Nealson and Scott, 2006), suggesting an adaptability to conditions similar to unconventional reservoirs. It was previously found in produced water from the Bakken region (Lipus et al., 2018), although only at low abundances, suggesting adaptation to well conditions over time. No other wells contained such high abundances of Family XI and Shewanellaceae. However, Family XI and Shewanellaceae were present in the remaining produced water samples with an average relative abundance of 2.17 and 1.58%, respectively ( Figure 3A, Supplementary Table 3, and Supplementary Figure 5). Two other identified bacterial families include Desulfobacteraceae and Desulfomicrobiaceae, which have an average relative abundance of 1.13 and 1.09% across all produced water samples, respectively ( Figure 3A, Supplementary Table 3, and Supplementary Figure 5). Both families were especially dominant in well 4, constituting 5.72 and 6.54%, respectively, of the total relative abundance (Figure 3A Figure 5). None of these taxa were found in every produced water sample and they were identified at average relative abundances of 0.88-3.50% across all wells. Archaea were also present in six of the 11 produced water samples and contributed a maximum of 0.89% of the relative abundance in any one well. The archaeal orders present in our samples included Methanobacteriales and Methanomicrobiales, which are common methanogens previously implicated in hydrocarbon degradation (Jiménez et al., 2016;Toth and Gieg, 2018).

Microbial Community Correlates With Geochemical Data
We completed several analyses in order to identify factors potentially contributing to and/or driving changes in the produced water microbial ecology. First, we constructed three non-metric multidimensional scaling (NMDS) plots using Bray-Curtis dissimilarity metrics. These ordination plots contain: (1) all samples and blanks from our current study (Supplementary Figure 6A), (2) samples and blanks from our current study as well as samples previously collected by Lipus et al. (2018) from the same sample sites (Supplementary Figure 6B), and FIGURE 3 | (A) Relative abundance of microbial families in produced water samples obtained from well separators. All families that represent =5% of sequences from any one sample are listed in the barplot, all other families are grouped together under "Other." (B) Non-metric multidimensional scaling (NMDS) plot of produced water samples with a stress value of 0.0720. This plot was constructed using Bray-Curtis distance calculated after sequence libraries were resampled to the depth of the sample with the fewest sequences from this experiment (1942 sequences). There was no significant clustering by well formation or well age (ANOSIM p > 0.05 for both), however, a Mantel test demonstrated a significant correlation between the microbial community and associated geochemical profile (R = 0.2336; p = 0.0430). Environmental vectors for geochemical parameters were fit onto the ordination plot, with the direction of the arrow corresponding to the direction of the gradient and the length of the vector proportional to the strength of the correlation between ordination and environmental variable. Only environmental vectors with a significant p-value (p < 0.05) are displayed; an NMDS plot with all environmental vectors as well as a table containing the R 2 and p-values for the corresponding vectors is located in Supplementary Figure 6. In this figure, TIC refers to Total Inorganic Carbon (TIC).
(3) samples and blanks from our current study and all samples previously collected by Lipus et al. (2018) from the Bakken region (Supplementary Figure 6C). Next, we used analysis of similarity (ANOSIM) calculations to confirm any visible clustering present in each of the three plots. We specifically tested for the clustering by the following categories, when possible: sample origin (tank, separator, or blank), well number, formation, well age, and sampling event. We found no statistically significant clustering based on sample origin, formation, or well age (ANOSIM: p > 0.05) in our first plot (Supplementary Figure 6A). However, incorporation of previously collected data at the sample sample sites revealed clustering by sample origin (ANOSIM: R = 0.1769; p < 0.05), well age (ANOSIM: R = 0.4350; p < 0.05), and sampling event (ANOSIM: R = 0.5622; p < 0.05). Samples did not cluster by well number or formation (ANOSIM: p > 0.05 for both). Interestingly, when we completed an ANOSIM with our experimental data and the paired 2018 data (but without our sequenced blanks), there was no separation by sample origin (ANOSIM: p > 0.05). These trends remained consistent when we analyzed our sequenced blanks, experimental data, and all previously published sequences by Lipus et al. (2018) (Supplementary Figure 6C). We found no significant clustering by sample origin, well number, or formation although there was significant clustering by well age (ANOSIM: R = 0.4533; p = 0.001) and sampling event (ANOSIM: R = 0.4874; p = 0.001) (Supplementary Figure 6C).
In order to examine the relationship between the microbial community within each well and the associated geochemical profile, we completed a Mantel test. We found a statistically significant, but weak relationship (R = 0.2336; p = 0.0430). Next, we calculated and fit vectors onto a fourth ordination plot, which contained only our experimental data, in order to identify environmental factors associated with produced water microbial structure (Figure 3B and Supplementary Figure 7). We determined that 11 of the 29 geochemical measurements have a significant correlation with the microbial data visualized on the NMDS plot ( Figure 3B and Supplementary Figure 7). The R 2 of the significant correlations was strong, ranging from 0.56 to 0.84 with an average R 2 of 0.65 (Supplementary Figure 7). TDS was significantly correlated with the microbial data, with an R 2 of 0.62. TDS is a measurement of the total amount of minerals, salts, metals, cations, and/or anions dissolved in water, therefore it is unsurprising that many of the TDS constituents also had a significant correlation with the microbial data. However, there were several notable trends. First, the Bakken Shale is known to have high NaCl concentrations compared to other shale regions ( Table 2 and Supplementary Table 2) (Lipus et al., 2018). Consistent with this, sodium and chloride were significantly correlated with the microbial data (R 2 = 0.62 and R 2 = 0.59, respectively), although NaCl% was not. Second, the R 2 for boron, calcium, copper, potassium, and manganese were higher than the R2 for TDS, suggesting that these ions may be especially linked to microbial metabolism within the Bakken Shale Formation. Other geochemical measurements with a significant correlation to the microbial data include bromide, sulfate, and TIC ( Figure 3B and Supplementary Figure 7).

DISCUSSION
To contribute to the expanding portfolio of produced water descriptions in the United States and build on previous work conducted in the Bakken Shale, we analyzed produced water samples from 14 different wells using integrated geochemical and microbiology techniques. Previous work demonstrates produced water from the Bakken region contains high TDS and salinity concentrations compared to other shale regions (Lipus et al., 2018;Wang et al., 2019). Our work supports this, with samples containing an average of 287,000 mg/L TDS and 88% wt NaCl across all wells ( Table 1). These conditions inhibit microbial growth, such that produced water generally exhibits low microbial loads. Previous work in the Bakken Shale reported microbial abundances of 1.00E+2 to 1.00E+5 16S rRNA gene copies/mL (Lipus et al., 2018). Consistent with this, we measured a median microbial load of 1.15E+2 16S rRNA gene copies/mL across all samples. The highest microbial abundance was recorded for produced waters from well 11 at 3.06E+5 16S rRNA gene copies/mL (Table 1 and Supplementary Figure 3). Although the microbial load in produced water from well 11 was higher than in the other collected samples within this dataset, the well age and geochemical characteristics, including major anion concentrations and isometric log-ratios, suggest that this water is representative of Bakken produced waters and that its low TDS is not the result of dilution due to hydraulic fracturing operations (Figure 2 and Supplementary Figure 2A). Rather, it is likely that its low TDS is due to the aquitard nature of the Bakken formation, which results in high spatial TDS variability (Bachu and Hitchon, 1996;Hitchon, 1996).
In addition to inhibiting microbial growth, the unique conditions in shale reservoirs also select for organisms that thrive in anaerobic, halophilic conditions. Four previous studies have explicitly examined the Bakken Shale microbial community in situ (Strong et al., 2014;An et al., 2017;Lipus et al., 2018;Wang et al., 2019). Strong et al. (2014) found a high abundance of Halanaerobium and Marinobacter in flowback water; An et al. (2017) discovered high abundances of Marinobacter, Halanerobium, and Desulfovermiculus in produced water samples; Lipus et al. (2018) found that produced samples were dominated by microbes in the Bacillales, Halanaerobiales, or Pseudomonadales orders; and Wang et al. (2019) reported a high abundance of Geobacter, Lactococcus, Enterococcus, and Bradyrhizobium in wastewater from newly fractured wells. Despite a significant difference in the structure of the microbial communities, our results were broadly congruent with those presented in these studies. Three of our dominant bacterial families belonged to the Bacillales, Halanaerobiales, and Pseudomonadales orders. Across all wells, Halanaerobium represented an average of 99.58% of the bacteria within the Halanaerobiales order. Additionally, Marinobacter and Bradyrhizobium were especially prevalent, representing an average of 56.59 and 95.26% of all microbiota in the Alteromonadales and Rhizobiales orders, respectively. It is unknown whether the microbial communities in the Bakken region in this study and other studies were native to the reservoir, or whether they were "seeded" or inoculated during well development and normal operating procedures. Regardless, the geochemical characteristics were within range of the Bakken region, strongly suggesting a longterm adaptation of the community, as opposed to a temporary augmentation to the system. Our samples were characterized by high alpha diversity and evenness compared to the other Bakken studies. They also contained high abundances of other taxa including members of the Methylomonaceae, Desulfobacteraceae, and Desulfomicrobiaceae families. Although these families have previously been detected in produced water, their unusually high abundances and the overall high evenness of these samples is notable. One hypothesis of the unusually high evenness is that the advanced age of these wells has selected for a more stable, even community.
We did not recover enough DNA for metagenomic analysis, however, previous studies of the abundant microorganisms shed some light in potential metabolic processes that may impact the oil and gas industry. For example, metagenomeassembled genomes (MAGs) of Halanaerobium recovered from the Marcellus and Utica Shales and a Halanaerobium strain isolated from the Barnett Shale were found to be capable of thiosulfate reduction (Daly et al., 2016;Liang et al., 2016;Lipus et al., 2017). Thiosulfate reduction is highly linked with microbially induced corrosion (MIC) (Choudhary et al., 2015) and sulfide production (Booker et al., 2017), which can damage well infrastructure and cause reservoir souring. Desulfobacteraceae and Desulfomicrobiaceae are also well known sulfate reducers that have previously been implicated in reservoir souring (Gieg et al., 2011;Engelbrektson et al., 2014;Vigneron et al., 2017). The Halanaerobium MAGs recovered from unconventional reservoirs contained genes encoding proteins involved with the biofilm formation process (Daly et al., 2016;Lipus et al., 2017). Microbial biofilms are a major issue in the oil and gas industry, as they can lead to clogging and increase the rates and/or severity of MIC (Johnson et al., 2008;Bottero et al., 2010;. Pseudomonas are especially well known due to their biofilm formation capabilities (Drenkard and Ausubel, 2002;Vikram et al., 2015) and have been shown to increase biofilm production when exposed to produced water (Vikram et al., 2014). Pseudomonas composed the majority of the sequences from the Pseudomonadles order in our samples (79.93% across all wells) except for well 4, which contained no Pseudomonas.
One of our goals was to understand the complicated geochemical and microbial dynamics present within the Bakken shale. We determined that 11 of the 29 geochemical measurements have a significant correlation with the microbial data (Supplementary Figure 6). These include TDS and many of its constituents such as sodium and chloride concentrations, which are higher in the Bakken region than in other producing shale formations (Lipus et al., 2018;Wang et al., 2019). This suggests these parameters shape the microbial ecology of Bakken Shale produced water, and these variables have an expected positive trend with TDS. Sulfate concentration also had a significant correlation with the microbial data, although our geochemical analysis revealed a negative trend between TDS and sulfate. One possible reason for this may be that certain sulfate minerals, specifically gypsum and anhydrite, are less soluble in higher TDS waters (Bock, 1961;Tanveer and Chen, 2020). This theory would suggest that low TDS wells, such as well 11, would allow for more sulfate to remain in solution; this low TDS, high sulfate niche may provide a unique environment for microbial activity. More research is needed to determine potential sulfate solubility and related microbial activity in unconventional reservoirs.
The isometric log-ratio plot showed that produced water samples from this study plotted within the range of USGS PWDB samples, but they plotted on the high end of the calciumsulfate ratios ( Figure 2C). This suggests that sulfate reduction was more prevalent in samples from this study than other samples collected from the Bakken region. This sulfate reduction could have occurred at any point during the geologic life of this produced water. However, previous research has detected the presence of sulfate reducing bacteria in Bakken produced waters, specifically in those with lower salinities . This is consistent with our observations in well 11, which has a low TDS, a high sulfate concentration, and high microbial load that is dominated by sulfate reducers. The majority of these sulfate reducers (14.96%) belong to the Desulfovermiculus genera, which was previously detected in the Bakken region by An et al. (2017). Members of this taxon have also been isolated from hydrocarbon associated environments, and shown to be chemolithoautotrophic and halophilic (Belyakova et al., 2006). Closely related organisms in the Desulfovibrionales order have also previously been associated with both MIC and reservoir souring (Gieg et al., 2011;Chen et al., 2015).
Produced water from well 11 and 18 both contain elevated sulfate concentrations, however, only well 11 had a high microbial load. This is most likely due to the high TDS concentration present in the produced water from well 18, which inhibits microbial growth. Given this context, it is possible that the lower salinity in combination with high sulfate concentrations resulted in the high microbial abundance observed in well 11, and suggests that sulfate reduction may have been actively occurring in this sample. Since the chloride and TDS trends suggest well 11 to be in range of other Bakken wells, this further suggests that the Bakken Shale may host heterogenous niches with conditions that are conducive to microbial growth and activity. Although our experimental data for well 18 does not support active microbial sulfate reduction, it is important to understand that the Bakken Shale is a dynamic ecosystem. First, consider that well 18 is 1 month removed from a completed hydraulic fracturing procedure, thus any new microbes introduced during the fracturing process may still be below the detection limit. Additionally, previous work demonstrates that the shaleassociated microbial community shifts in response to changing geochemical conditions in the months after oil and gas production begins (Daly et al., 2016). Finally, when we analyzed our data concurrently with data previously collected within the same region (Lipus et al., 2018), we found a significant difference in the microbial community that correlated with well age and sample event. This suggests microbial community divergence at well age orders of years as opposed to shorter termed time periods.
This work represents the first study to couple geochemical trend analysis with microbial data for oil and gas systems, demonstrating the geochemical parameters have the potential to shape the microbial community present in the Bakken Shale. Our integrated approach provides valuable context to the three previously completed studies in this region, however, additional work must be completed in order to understand the complex dynamics present in hydraulically fractured wells. For example, our results demonstrate the microbial community can change within the same wells after a year of production; additional 16S rRNA gene sequencing in this region would allow us to better determine whether seasonality, well age, or another unidentified environmental factor is the primary driver of microbial community in the Bakken Shale. Utilizing metagenomic and/or metatranscriptomic techniques within this region will help us identify mechanistic links between the produced water microbiota and chemistry. Finally, a meta-analysis of all hydraulically fractured well chemical and microbial data may help us identify broader trends present in all unconventional reservoirs. Ultimately, our goal is to better understand the microbiology and geochemistry of hydraulically fractured wells so we can continue to safely and efficiently utilize our natural resources.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ncbi.nlm.nih. gov/, PRJNA611830.

AUTHOR CONTRIBUTIONS
KT filtered the samples for microbial analysis, prepared and sequenced the 16S rRNA gene sequencing libraries, analyzed the 16S rRNA gene sequencing data, and wrote the manuscript. JG analyzed the geochemical data and wrote the manuscript. DL collected the field samples, recorded the field data, and preserved the field samples. PS filtered the samples for microbial analysis, collected the qPCR data, and prepared and sequenced the 16S rRNA gene sequencing libraries. MS collected the ICP-OES and IC measurements. DG procured the funding, collected the field samples, recorded the field data, preserved the field samples, and analyzed the geochemical data and qPCR data. All the authors reviewed and provided editorial feedback.

FUNDING
This project was supported in part by an appointment to the Science Education Programs at National Energy Technology Laboratory (NETL), administered by ORAU through the U.S. Department of Energy Oak Ridge Institute for Science and Education. This work was also performed in support of the U.S. Department of Energy's Fossil Energy Crosscutting Technology Research Program. The Research was executed through the NETL Research and Innovation Center's Onshore Unconventional Research. Research performed by Leidos Research Support Team staff was conducted under the RSS contract 89243318CFE000003.