The Volatilome: A Vital Piece of the Complete Soil Metabolome

Soils harbor complex biological processes intertwined with metabolic inputs from microbes and plants. Measuring the soil metabolome can reveal active metabolic pathways, providing insight into the presence of specific organisms and ecological interactions. A subset of the metabolome is volatile; however, current soil studies rarely consider volatile organic compounds (VOCs), contributing to biases in sample processing and metabolomic analytical techniques. Therefore, we hypothesize that overall, the volatility of detected compounds measured using current metabolomic analytical techniques will be lower than undetected compounds, a reflection of missed VOCs. To illustrate this, we examined a peatland metabolomic dataset collected using three common metabolomic analytical techniques: nuclear magnetic resonance (NMR), gas chromatography-mass spectroscopy (GC-MS), and fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS). We mapped the compounds to three metabolic pathways (monoterpenoid biosynthesis, diterpenoid biosynthesis, and polycyclic aromatic hydrocarbon degradation), chosen for their activity in peatland ecosystems and involvement of VOCs. We estimated the volatility of the compounds by calculating relative volatility indices (RVIs), and as hypothesized, the average RVI of undetected compounds within each of our focal pathways was higher than detected compounds (p < 0.001). Moreover, higher RVI compounds were absent even in sub-pathways where lower RVI compounds were observed. Our findings suggest that typical soil metabolomic analytical techniques may overlook VOCs and leave missing links in metabolic pathways. To more completely represent the volatile fraction of the soil metabolome, we suggest that environmental scientists take into consideration these biases when designing and interpreting their data and/or add direct online measurement methods that capture the integral role of VOCs in soil systems.


INTRODUCTION
As a complex and heterogeneous ecosystem, soil harbors a myriad of biological processes that are challenging to uncover. Two major contributors to biological soil processes are microbial communities and plant roots. Soil microbial communities are engines of chemical interconversion-microbes produce and consume chemical substrates for metabolism, generating metabolites as waste by-products. Tightly interwoven with the soil microbial network, plant roots also emit metabolites as exudates (Bertin et al., 2003;Vives-Peris et al., 2020). These diverse metabolites form the metabolome, which can be interrogated to provide insight into belowground biological processes. Volatile metabolites, constituting the volatilome (Amann et al., 2014), represent a part of the comprehensive metabolome. Volatile organic compounds (VOCs) have high vapor pressures causing them to enter the gaseous phase depending on environmental conditions. While atmospheric chemists routinely measure VOCs directly using specialized techniques, VOCs are rarely considered in the burgeoning collection of soil microbial metabolomic studies. Could removing bias in sample processing and including more direct measurements of VOCs help fill in a missing, important part of the soil metabolome?
Challenges inherent to most approaches evaluating soil organic matter composition and soil metabolomics make it difficult to simultaneously detect VOCs. First, soil metabolomic field studies start with collecting soil samples to bring back to the lab for sample processing and metabolic measurements. These ex-situ methods overlook VOCs that escape to the atmosphere either prior to sample collection or during disruptive soil sampling (Hewitt, 1996). This artifact may even affect VOC-resolving techniques, such as gas chromatography-mass spectrometry (GC-MS), where VOC loss to the atmosphere prior to sample analysis is a possibility (Eriksson et al., 2001). Second, some metabolomics measurement techniques require an initial liquid chromatography (LC) step, as is the case with LC-mass spectrometry (LC-MS) to characterize compound structure. This process immediately limits the detectable VOCs to those dissolved in the liquid phase. The solvent used to prepare samples in the lab for LC and other direct injection methods can also bias the extracted metabolites from soil samples; methanol preferentially extracts semipolar to nonpolar molecules, whereas water extracts polar molecules (Hollywood et al., 2006). Furthermore, soluble VOCs may partition between the gas and liquid phases making quantification unreliable, while insoluble VOCs will go undetected. Finally, inherent biases, including target size of compounds and ionization mode (positive vs negative), exist for each of the most widely used metabolomic analysis techniques ( Table 1). Some of these challenges to measuring the soil volatilome may be addressed by adapting online measurement methods from atmospheric chemistry that directly measure VOCs in the gas phase such as proton-transfer-reaction time of flight MS (PTR-TOF-MS). Increasingly, the soil volatilome is being characterized in its own right using these approaches, either by measuring gases at the soil-atmosphere surface (e.g., using soil incubation chambers [Asensio et al., 2007]), and recently with belowground diffusive probes to measure VOCs in situ (Gil-Loaiza et al., 2020).
VOCs also play important ecological roles in soil and influence atmospheric chemistry. Unlike their non-volatile counterparts, VOCs readily exchange between the soil and atmosphere. In the atmosphere, VOCs are active in photochemical reactions and secondary aerosol formation, and thereby affect air quality (Chameides et al., 1988;Park et al., 2013;Ghirardo et al., 2020), climate (Müller et al., 2017), and precipitation dynamics (Zhao et al., 2016). In the soil, VOCs are important signaling molecules that drive microbe-microbe, plant-microbe, and plantplant interactions (Penuelas et al., 2014). Specifically, like nonvolatile metabolites, microbial VOCs can promote plant growth (Tahir et al., 2017;Tyagi et al., 2018) and plant-root VOCs can either deter or attract microbes (Bitas et al., 2013). But unlike non-volatile metabolites, VOCs diffuse through soil more readily, extending their zone of influence. For example, while the rhizosphere zone influenced by root exudates may be restricted to millimeter-scales for non-volatile compounds, VOCs may diffuse centimeters or farther from roots, thereby extending the reach of the effective rhizosphere (de la Porte et al., 2020). These examples emphasize some of the unique roles of VOCs in the soil and signify that capturing VOCs within the complete soil metabolome is important for resolving belowground processes and aboveground interactions.
Despite the important roles of soil VOCs, we argue that VOCs have often been overlooked in soil metabolomic studies. These studies typically map metabolites to metabolic pathways to guide expectations for specific biological processes. However, metabolite volatility is rarely considered, even in cases where standard metabolomic analytical techniques capture VOCs. In fact, the volatility of many soil organic compounds is unknown, confounding our current understanding of whether VOCs are underrepresented. Despite a lack of measurements of the volatility of organic compounds, tools that predict volatility from molecular functional groups (Hilal et al., 2007;Nannoolal et al., 2008;Pankow and Asher, 2008) have been widely adopted by the atmospheric chemistry community.
Here, we integrate disciplinary approaches to predict metabolite volatility in three VOC-containing metabolic pathways. We show that compounds with high volatility are disproportionately undetected in a peatland metabolomic dataset derived from three techniques (GC-MS, Fourier-transform ion cyclotron resonance MS [FTICR-MS] by direct injection, and nuclear magnetic resonance [NMR]). The peatland ecosystem is ideal for evaluating the representation of VOCs in metabolomics because we expect high quantities of VOCs (Seewald et al., 2010) as fermentation products of anaerobic metabolism in waterlogged, anoxic conditions. This approach establishes a baseline understanding of the soil volatilome and the implications of its underrepresentation in current soil metabolomics studies.

CONCEPT
To predict the volatility of metabolites along metabolic pathways, we adapted tools used to estimate VOC partitioning Sample Extraction -Water (polar compounds).
-Folch sequential methanol, chloroform, and water (non-polar and polar compounds in bi-phasic layers.)* (Only the top aqueous layer for this study).
yes no 19-500 Da yes yes -Good for VOCs.
-Volatility range is dependent on the chromatography column.
between the gaseous and condensed aerosol phases under the assumption of standard conditions (i.e., temperature and pressure) and dry sorbent material. Detailed discussions of properties of and factors affecting volatility have been well described elsewhere (Hilal et al., 2003;Nannoolal et al., 2008;Compernolle et al., 2011;Tang et al., 2019). We calculated metabolite vapor pressure (P; atm) using SIMPOL.1 (Pankow and Asher, 2008), which accounts for the impact of functional groups. Specifically, we estimated vapor pressure using the following equation: for functional groups k = 1, 2, 3. . ., where b 0 is a constant, b k is the functional group contribution term for group k, and v is the number of groups of type k in the compound. For example, carbon double bond (C = C), and aromatic ring functional groups each decrease P to a different degree. The method did not specify the impact of phosphate groups, which are common in metabolic pathways, so we used nitrate as a proxy with the assumption that these functional groups have a similar relative contribution to P (Nannoolal et al., 2008). The volatility of VOCs in a given environment can be expressed by the tendency to partition to the gas vs condensed phase. In the atmosphere, VOC partitioning (ξ ) to the gas phase increases with vapor pressure (C * ), which is by convention converted using the ideal gas law to mass-based saturation vapor concentration (µg m −3 ) that accounts for molecular mass (log 10 C * = log 10 (PM/RT), where M is molecular mass, R the universal gas constant, and T temperature). Under clean atmospheric conditions, thresholds for nonvolatile, intermediate volatility, and volatile are on the order C * = 0.01 µg m −3 , 1 µg m −3 , and 100 µg m −3 , respectively, or more conveniently on a log scale: log 10 C * = −2, 0, and 2 (Donahue et al., 2006). VOC partitioning also depends on the total availability of condensed-phase organic molecules (e.g., total aerosol; C Total ). The same VOC will appear less 'volatile' in a polluted atmosphere with greater partitioning on high aerosol loadings [ξ i = 1/(1+C i * /C Total )]. Soil contains large quantities of organic matter, but theories linking soil C Total to thresholds for volatility in the subsurface have not been established. Here, we therefore report a relative volatility index (RVI) using log 10 C * (RVI = log 10 C * ) as the volatility scale, with the understanding that gas phase partitioning will be dependent upon the environment in soil pores (temperature, moisture, pressure) or at the soilatmosphere interface. While the RVI does not give an absolute indication of whether a compound is volatile in the soil, it can be used to compare compound volatilities relative to one another.
To provide a reference framework for assessing the extent of detected volatiles, we used metabolic pathway maps that visualize metabolic reactions and their intermediates. Metabolic pathway maps, such as those in the KEGG metabolic database (Kanehisa and Goto, 2000), help soil scientists visualize active soil processes. To illustrate our concept, we assessed a peatland soil metabolomic dataset (generated using typical soil metabolomic analytical techniques that first extracted metabolites from soil (Folch et al., 1957) for analysis by FTICR-MS, 1 H NMR, and GC-MS (Wilson et al., 2021) (methods included in Table 1). While FTICR-MS provides compound masses that can be used to predict formulae, 1 H NMR, and GC-MS provides m/z values that must be matched to compounds in reference databases. For all metabolite analysis, previously reported methods, standards for peak picking, compound databases, and formula assignment were used (e.g., Weljie et al., 2006;Hiller et al., 2009;Kind et al., 2009;Tolicì et al., 2017;Tfaily et al., 2018).
All metabolomic data were combined, matched to KEGG compound IDs when possible, and mapped to pathways using the KEGG Pathway Mapper (Kanehisa and Sato, 2020). We note that some formulas matched to more than one isomer or compound, and all matches were included in this analysis. We selected three VOC-containing pathways that we expect to be present in peatland ecosystems: (1) monoterpenoid biosynthesis (Mono Bio), describing the formation of monoterpenes which are highly volatile; (2) diterpenoid biosynthesis (Di Bio), describing the formation of diterpenes which include many nonvolatile compounds with a few exceptions; and (3) polycyclic aromatic hydrocarbon degradation (PAH Deg), describing the breakdown of hydrocarbons and including many semi-volatile compounds. Along these pathways, we calculated average RVIs for detected and undetected compounds in the peatland dataset and tested for significant differences using non-parametric Mann Whitney U-test.

OVERALL PATTERNS
In all our focal pathways, most compounds were detected using FTICR-MS, with only phthalate detected by GC-MS and catechol by NMR (both from the PAH Deg pathway). All these pathways contain mostly secondary metabolites, compounds that tend to exceed the detectable target range of NMR (Table 1). NMR is well suited to measure smaller molecules that are in higher abundance, particularly primary metabolites, some of which are VOCs (i.e., catechol, methanol, and acetone). Indeed, the average RVI of NMR-detected compounds across all three pathways was +2.1 ± 7.2. Therefore, NMR may be a good option for detection of small, primary VOC metabolites, however, the technique suffers from biases that may preclude detection of many VOCs (Table 1). On the other hand, based on the RVIs of the compounds in our focal pathways, the GC-MS and FTICR-MS standard soil metabolomic analysis techniques were better suited to detect compounds with lower volatility.

Diterpenoid Biosynthesis
Diterpenoids are a class of molecular compounds containing four joined isoprene (C 5 H 8 ) units that include momilactones, oryzalexins, gibberellins, and kaurenes. The capacity to biosynthesize diterpenoids is present in plants, fungi, and select bacteria (Gutiérrez-Mañero et al., 2001;Zi et al., 2014;Tang et al., 2015). The roles of diterpenoids in plants are diverse and include pathogen defense, plant growth effectors, signaling, and abiotic stress responses (Lu et al., 2018;Murphy and Zerbe, 2020). Fungi and bacteria also produce diterpenoids as antimicrobial agents and plant growth promoters (de Boer and de Vries-van Leeuwen, 2012;Zhao et al., 2018).
We elaborated on the Di Bio pathway because a high proportion of metabolites were detected in the peatland dataset (40.8%; Figure 1A), strongly suggesting that this metabolism was active in the peatland. Previous studies in peatland ecosystems examining VOC emissions in situ (PTR-TOF) also detected diterpenes (Li et al., 2020), signifying the presence of volatile diterpenes in peatlands. In our peatland dataset, the average RVI of undetected compounds was significantly higher than those detected (RVI undetected = −0.97 ± 5.7 vs RVI detected = −4.6 ± 2.8, p < 0.0001; Figure 1A). After removing five undetected outliers with extremely low RVIs (−14, −16, −19, −21, and −26), RVI undetected = +0.04 ± 3.4. This indicates a preference for detection of compounds with lower volatilities by standard metabolomic methods.
A closer examination of the gibberellin biosynthesis sub pathway within the Di Bio pathway shows that almost all of the gibberellins were detected in the peatland dataset; however, there were a series of undetected intermediate compounds on the pathway to their production ( Figure 1B). Gibberellins are plant hormones that promote growth and root elongation (Tanimoto, 2005), and even some plant growth promoting bacteria can also produce gibberellins (Gutiérrez-Mañero et al., 2001;Bottini et al., 2004). Within this sub pathway, five compounds between ent-Copalyl-PP and GA12-aldehyde were undetected ( Figure 1B) and, furthermore, had higher average RVIs than the detected compounds (RVI undetected = +0.34 ± 2.2 vs RVI detected = −5.4 ± 2.1, p < 0.0001). To support the SIMPOL.1 prediction that these undetected kaurene compounds had high volatility, previous research has found kaurene to be volatile and emitted by plant flowers, stems, leaves, and roots (Yáñez-Serrano et al., 2018). The capacity to synthesize kaurene compounds within this sub pathway is shared between plants, fungi, and bacteria (Salazar-Cerezo et al., 2018). Therefore, kaurene should be produced regardless of source, but may remain undetected due to its higher volatility.

Monoterpenoid Biosynthesis
Monoterpenoids are molecules with two joined isoprene units that include α-pinene, linalool, camphor, and iridoids. Plants are the primary producers of monoterpenoids, and there are very few reports of bacteria or fungi capable of this synthesis (Penuelas et al., 2014). It is well established that plants emit monoterpenes from flowers to attract pollinators (Barragán-Fonseca et al., 2020) and from their roots to attract beneficial microbes and small eukaryotes like nematodes (Ali et al., 2011). This release of monoterpenes also deters pathogens using their antimicrobial and anti-fungicidal attributes (Lee et al., 2016;Reis et al., 2016).
We focused on the Mono Bio pathway map because, unlike diterpenes, most monoterpenes are volatile, and we therefore expected fewer compounds from this pathway to be present in our peatland dataset. Indeed, we detected only 20% of the compounds in the Mono Bio pathway ( Figure 1A). As expected, the overall RVIs were higher in the Mono Bio than the Di Bio pathway, signifying a higher overall volatility of the monoterpene compounds (+2.7 vs. −1.6, respectively). Similar to the Di Bio pathway, the average volatility of the compounds detected were significantly lower than those undetected (RVI undetected = +5.4 vs. RVI detected = −1.3, p < 0.001).
The sub pathway to iridoid compound biosynthesis contained a majority of the detected compounds including loganin, secologanin, and laganate ( Figure 1C). These detected iridoids had a lower volatility than the seven missing compound intermediates stemming from Gereanyl-PP (RVI undetected = +2.79 vs. RVI detected = −8.94, p < 0.001; Figure 1C). This represents another example of intermediate metabolites that should be present in the soil but are absent from the dataset and have a significantly higher volatility.

Polycyclic Aromatic Hydrocarbon Degradation
Polycyclic aromatic hydrocarbons (PAHs), a class of chemicals with two or more benzene rings fused together, occur naturally in coal, oil, and gasoline or can be produced through the incomplete combustion of these biomasses. PAHs in soils are often from anthropogenic sources as fossil fuel combustion creates atmospheric emissions that deposit on land (Malawska et al., 2006), but natural PAHs can also form, for example, from microbial decomposition of plant residues in aerationexposed peatlands that go through seasonal thaws (Gabov et al., 2020). Soils, particularly peatlands, are the main reservoir for PAHs in the environment (Wilcke and Amelung, 2000) where PAHs are persistent and difficult to break-down due to their hydrophobic properties causing them to strongly bind to soil particles (Gabov et al., 2020). Some bacteria (Déziel et al., 1996;Ghosal et al., 2016) and fungi (Hammel, 1995;Kadri et al., 2017) are capable of PAH degradation, however, in peatlands, PAHs accumulate at fast rates due to low degradation rates in the highly anaerobic conditions with high organic content. Yet, some peatlands are capable of higher degradation rates; for example, Ledum peatlands can degrade PAHs at a greater rate than Sphagnum peatlands (Wang et al., 2019).
We focused on the PAH Deg pathway because, as expected, our peatland dataset contained a high number of compounds this pathway (40%, Figure 1A). Furthermore, PAHs and their degradation products are semi-volatile (Ghosal et al., 2016), therefore, have the potential to be missed using standard metabolomic techniques. Consistent with our above findings, the average volatility of the detected compounds was significantly lower than those that were undetected (RVI undetected = +0.88 ± 2.4 vs RVI detected = −2.2 ± 2.1, p < 0.001).
Compounds from sub-pathways within the PAH Deg pathway were detected in the peatland dataset to different degrees. From the benzo[a]pyrene degradation sub-pathway, almost all compounds were detected, except for benzo[a]pyrene itself. Benzo[a]pyrene is the largest PAH with six rings, and the compounds from this sub pathway had the lowest volatilities overall (average RVI = −3.5 ± 2.3). In contrast, compounds from the other three sub-pathways for pyrene, anthracene, phenanthrene, and fluorene have more patchy detection, but also higher RVIs on average. For example, anthracene, a PAH with three rings, and its degradation products have on average RVI of +0.20 ± 2.6 (for comparison of Benzo[a]pyrene and anthracene degradation see Figure 1D).

CONCLUSION
Here, we provide compelling evidence that typical soil metabolomic analytical techniques miss some soil VOCs and therefore underestimate the role of the volatilome in the total soil metabolome and make it difficult to conclude which pathways are active. We showed that compounds undetected in a peatland dataset had significantly higher estimated volatilities than those detected within the context of three important VOC-containing metabolic pathway. There are several reasons that a compound could be undetected, including low steady state concentrations, chemical instability, short lifetime, and fast metabolism. While these other processes could be affecting compound detectability, we argue that non-volatile compounds are just as susceptible as volatile compounds, and, therefore, do not affect our conclusions. Given the plethora of known and currently unknown metabolic pathways in soil, these results only begin to unearth the potential for a missing volatilome in current soil metabolomic research projects.
Already, researchers often use more than one measurement method because no single technique can capture all metabolites in a sample. While existing techniques can be tuned to target specific compounds, there is currently no global method that can provide molecular identification of all chemicals in a system at high time resolution. Each technique is specialized to target different sizes and classes of compounds and comes with its own biases depending on how samples were processed and data analyzed (Table 1). Furthermore, biases inherent in sample collection, extraction, and measurement can compound, therefore, the methodology for each step should be carefully considered. Adjustments in sample processing (i.e., collecting samples in air-tight containers or capturing VOCs in adsorptive cartridges) and selection of analysis methods could help gear soil metabolomic measurement techniques to capture more VOCs. Additionally, in situ VOC measuring techniques, such as PTR-TOF-MS or GC-CI-MS (Table 1), could be added to the soil metabolomic repertoire to directly target volatile metabolites. Moreover, online VOC analytical techniques yield faster measurements than FT-ICR-MS, GC/LC-EI-MS, and NMR, with near continuous (1Hz) data for uninterrupted analysis by PTR-TOF-MS, and longer sample measurement time for GC-CI-MS, but with greater structural information. New efforts to integrate VOC measuring techniques to capture and characterize the complete soil metabolome will provide a deeper understanding of the complex biological processes occurring belowground.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: SPRUCE long-term repository (doi: https://doi.org/10.25581/spruce.083/1647173).

AUTHOR CONTRIBUTIONS
LM, LH, JK, and MT conceived of the concept and approach. LH and LM wrote the manuscript. MT provided peatland dataset. LH curated the data and chose metabolic pathways. KG and LH calculated compound RVIs. JK contributed his expertise in volatility predictive models. LH produced the figure. All authors contributed to editing the manuscript.