Using Diffuse Reflectance Spectroscopy as a High Throughput Method for Quantifying Soil C and N and Their Distribution in Particulate and Mineral-Associated Organic Matter Fractions

Large-scale quantification of soil organic carbon (C) and nitrogen (N) stocks and their distribution between particulate (POM) and mineral-associated (MAOM) organic matter is deemed necessary to develop land management strategies to mitigate climate change and sustain food production. To this end, diffuse reflectance mid-infrared spectroscopy (MIR) coupled with partial least square (PLS) analysis has been proposed as a promising method because of its low labor and cost, high throughput and the potential to estimate multiple soil attributes. In this paper, we applied MIR spectroscopy to predict C and N content in bulk soils, and in POM and MAOM, as well as soil properties influencing soil C storage. A heterogeneous dataset including 349 topsoil samples were collected under different soil types, land use and climate conditions across the European Union and the United Kingdom. The samples were analyzed for various soil properties to determine the feasibility of developing MIR-based predictive calibrations. We obtained accurate predictions for total soil C and N content, MAOM C and N content, pH, clay, and sand (R2> 0.7; RPD>1.8). In contrast, POM C and N content were predicted with lower accuracies due to non-linear dependencies, suggesting the need for additional calibration across similar soils. Furthermore, the information provided by MIR spectroscopy was able to differentiate spectral bands and patterns across different C pools. The strength of the correlation between C pools, minerals, and C functional groups was land use-dependent, suggesting that the use of this approach for long-term soil C monitoring programs should use land-use specific calibrations.


INTRODUCTION
Climate change and land management play a major role in the accrual or loss of global soil organic carbon (C) stocks. In this context, C distribution in soil organic matter (SOM) fractions is a crucial early indicator of soil C changes and provides an opportunity to monitor and model C sequestration and its response to agricultural practices . In particular, soil C distribution in particulate (POM) and mineral-associated (MAOM) organic matter has been proposed as an effective approach to study soil C responses to global changes and land use, and inform management decisions (Wang et al., 2019). POM is constituted of coarse partially decomposed organic material which if not occluded in aggregates is unprotected, while MAOM is organic matter associated with fine mineral soil particles by chemical bonding or physical occlusion, which may confer it protection from fast decomposition (Lavallee et al., 2020). These two functionally distinct SOM physical fractions have contrasting pathways of formation (Cotrufo et al., 2015), as well as mechanisms of protection (Lützow et al., 2006), and therefore respond differently to disturbances, such as N additions, warming or elevated CO 2 .
Soil C and N distribution in MAOM and POM can, therefore, be used to evaluate soil C vulnerability to disturbance as well as potential to further accrual (Cambardella and Elliott, 1992;Christensen, 2001). The response of MAOM and POM to changes in agricultural practices as well as climate depend on multiple and co-occurring factors, which contribute to the overall SOM responses. For example, C sequestration in MAOM requires more N than in POM (Cotrufo et al., 2019) and it is highly controlled by soil mineralogy, while the changes in POM depend largely on vegetation cover and residue decomposition and is particularly vulnerable to management decisions and climate change (Córdova et al., 2018;Mikutta et al., 2019;Soucémarianadin et al., 2019).
Across a large geographical area, parent material and climate can lead to substantial changes in soil properties (Arrouays et al., 2014;Ballabio et al., 2019). Therefore, relatively fine spatial sampling is needed in order to capture these complex relationships between variables as well as inherent soil variability. Overall, there is a strong interest in accurately quantifying soil C and N stocks at a global scale to understand better the processes that promote C accumulation (Yigini and Panagos, 2016). However, to obtain an appropriate sampling density for geostatistical analysis, conventional laboratory analyses of SOM fractions and their elemental C and N content become inordinately expensive and time-consuming. This is especially important for models that infer spatiotemporal dynamics of soil C and N from changes in environmental predictors (i.e., climate, land use, texture), which must be calibrated and validated with measured data for accurate forecasting (Lugato et al., 2021).
Infrared spectroscopy has been proposed as an alternative method for a rapid determination of soil quality (Seybold et al., 2019) as well as to predict the distribution of SOM fractions (Baldock et al., 2018). Mid-infrared (MIR) spectroscopy combined with chemometrics is a feasible quantitative analysis method in soils (Sanderman et al., 2020). Additionally, MIR provides detailed information about fundamental vibrations of minerals (i.e., quartz -Si-O frequencies) and organic matter (Soriano-Disla et al., 2014). Unlike near-infrared spectroscopy (NIR), MIR has allowed the identification of several moieties, including aliphatic, methyl, amides III, and polysaccharide components associated with specific SOM fractions in soils (Parikh et al., 2014;Calderón et al., 2017). However, few studies have tested the use of MIR spectroscopy for the prediction of C and N content in SOM fractions with calibrations obtained from soils distributed across a broad range of climates and ecosystem types at a continental scale. Given that MIR spectroscopy provides a high spectral resolution and that MIR spectral characteristics have been associated with specific SOM functional groups and minerals, we hypothesize that MIR can be useful in predicting C and N in MAOM and POM fractions through chemometric calibrations at continental scale applications. MIR has thus not only the potential to quantify soil properties, but also to help to identify regions in which soils are most vulnerable to C losses.
The selection of an optimal chemometric calibration method is essential to estimate soil attributes using the MIR spectra. This could help overcome the difficulties of analyzing large quantities of soil samples. In recent years, various quantitative methods have been applied to MIR spectral data to predict different soil properties such as linear regression (MLR), principal component regression (PCR), and partial least square (PLS) regression (Vohland et al., 2011). PLS is a commonly used method as it typically provides satisfactory results. However, more work needs to be done in chemometric modeling of SOM fractions, specifically, when the studies encompass a broad geographical range. One important question is whether it would be feasible to develop a single, global predictive calibration to estimate a given property at a continental scale, or alternatively, separate local calibrations for soils with different climates and land uses.
Although several studies have addressed this question through a local approach, resulting in higher performance accuracy, a robust continental-scale model would be especially beneficial since it does not require site-specific data and may include a greater source of variability. This is especially important considering that a local model can become invalid if it is unable to accurately capture a variation in the variable of interest as a result of an environmental change. Accordingly, the main objectives of this study were to: 1) Identify MIR spectral changes associated with variations in SOM, MAOM and POM C and N content; 2) Determine MIR spectral regions that may improve the performance of PLS in order to estimate SOM, C and N content in MAOM and POM; 3) Explore the potential use of MIR spectroscopy coupled with PLS to develop continental-scale calibrations for C and N estimations in bulk soil and SOM fractions, as well as soil attributes controlling soil C content under a broad range of environmental conditions. To this end, we scanned 349 topsoil samples from across Europe, with widely different soil properties, climate and vegetation conditions. We also tested the ability of MIR PLS to predict the C and N content in bulk soils and SOM fractions as well as the main factors influencing C storage.

SOM Samples and Site Data
For this study, we used the total soil and SOM fractions from Lugato et al. (2021). Briefly, 400 representative soil samples from the Land Use and Coverage Area frame Survey (LUCAS; Orgiazzi et al., 2018) across the European Union and United Kingdom were selected for SOM fractionation (Supplementary Figure S1). These were all mineral (soil organic C < 120gCkg −1 ) topsoils (0-20cm). Size fractionation was conducted to separate POM (>53μm) from MAOM (<53μm) by wet sieving after mechanical dispersion of aggregates by shaking in sodium hexametaphosphate with beads, as described in Cotrufo et al. (2019). Briefly, for each sample 5g of 2mm sieved and oven dried soil was shaken in dilute (0.5%) sodium hexametaphosphate with beads for 18h to break the aggregates. The dispersed soil was then rinsed onto a 53µm sieve, with the fraction remaining on the sieve being collected as POM, and the fraction passing through (<53µm) collected as MAOM. This fractionation approach, while convenient for the high throughput, defines POM by size (>53μm), thus small amounts of very fine POM could be recovered in the MAOM fraction, and OM associated to coarse minerals (e.g., sand) is recovered as POM. A separation of POM by size and density, when feasible, is advisable. All bulk soils, MAOM and POM samples were pulverized, oven dried at 60°C and analyzed for C and N concentration in an elemental analyzer (LECO TruSpec CN). Inorganic C was removed by acid fumigation (Harris et al., 2001) before elemental analyses, in the few samples where it was present. Due to a variety of reasons (non-readable labels, broken sample containers during transportation and general quality checked after fractionation), 349 out of 400 were retained for MIR and statistical analysis.

Mid-infrared Measurement
The 349 bulk soil samples were air-dried, ground to fine powder, and scanned undiluted (neat) in the mid-infrared spectral region (4,000-400cm −1 ). The spectra were obtained using a Digilab FTS 7000 spectrometer (Agilent Technologies, Walnut Creek, CA) SOM C, organic C content in soil organic matter (SOM); MAOM C, organic C content in mineral-associated organic matter (MAOM) fraction; POM C, organic C content in particulate organic matter (POM) fraction; TN, total nitrogen; MAOM N, total N content in MAOM; POM N, total N content in POM fraction; f MAOM, proportion of C in mineral-associated organic matter (MAOM) relative to soil organic C; C/N, soil carbon to nitrogen ratio; C/N MAOM, carbon to nitrogen ratio in MAOM fraction; C/N POM, carbon to nitrogen ratio in POM fraction; BD, bulk density; CEC, cation exchange capacity; VGR, volume of gravel.
Frontiers in Environmental Science | www.frontiersin.org May 2021 | Volume 9 | Article 634472 with a KBr beam splitter, and a Peltier-cooled DTGS detector. The samples were analyzed in diffuse reflectance, and each spectrum consisted of 64 co-added scans at 4cm −1 resolution. Scans were conducted in duplicate for each sample, and subsequently, the average spectrum was calculated using GRAMS/AI software (Thermo Scientific).

Chemometric Analyses and Mathematical
Pre-treatment of the Spectral Data Partial least Square (PLS) regressions were performed to estimate C and N pools, clay, pH, C/N ratio and soil texture (clay and sand content). Several mathematical pre-treatments were tested, including the Savitzky-Golay function, the second derivative, standard normal variate (SNV), and multiplicative scatter correction (MSC) to reduce baseline variations and/or enhance spectral features. The best improvement was reached using Savitzky-Golay functions with a first derivative using three point smoothing (Savitzky and Golay, 1964). A non-linear Kernel partial least squares (PLS) regression model was used for building the predictive models. During model calibration, a priori outlier detection techniques were adopted to remove influential outliers by plotting the residual X-and Y-variances against leverages and critical thresholds as calculated from Hotelling's T 2 and Q-residuals statistics. The fraction of outliers removed from the models varied from 2 to 2.5%. The PLS calibrations were validated using a random cross-validation approach. The cross-validation was performed by randomly dividing the calibration set into multiple segments using the default setting of the software (Westad and Marini, 2015). The spectral analysis and the development of PLS models were carried out in the Unscrambler chemometric software V.10.3 (CAMO, Norway).

Statistical Analyses
The accuracies and the robustness of the PLS models were evaluated for calibration and prediction using the root mean square error (RMSE) and root mean square error of crossvalidation (RMSEcv), determination coefficient (R 2 ) and the ratio of prediction to deviation (RPD), which was calculated as the standard deviation divided by RMSE. The RMSE is a measure of the difference between the predicted value and measured value, while the R 2 indicates how well the model calibration and validation fits the measured data. RPD is a dimensionless parameter that represents the ratio of the standard deviation to the RMSE. The model performance was classified according to the following RPD intervals (Viscarra Rossel et al., 2006;Bellon-Maurel et al., 2010): RPD ≥2 to indicate the model performance was very good to excellent and reliable for quantitative predictions; 1.8 < RPD <2 to indicate satisfactory performance for quantitative predictions, but may be further improved by other approaches; RPD <1.8 to indicates fair predictions and RPD <1.4 poor or insufficient for most applications.
The database was analyzed by Principal component analysis (PCA) and subsequent redundancy analysis (RDA) to elucidate the importance of explanatory variables such as land use types, climate and soil properties on soil spectral properties. Before executing the PCA, the averaged spectra for each soil were baseline corrected and mean-centered. In the RDA, the most significant discriminating variables on spectral bands were selected based on PCA loadings, which was carried out with CANOCO version 5.01 (Microcomputer Power, United States). A correlation matrix provided in the supplementary material was calculated to determine the strength of the relationships between soil properties using the corrplot package in R. A p-value of less than 0.05 was considered statistically significant. The Pearson's correlation (r) analysis between MIR absorbance values and C and N content in SOM, MAOM and POM were represented using contour/heatmaps. Correction for baseline offsets was applied to the spectra, and the results (r) were linearly interpolated into a grid to smooth the contour/heatmaps (Sigmaplot v.11).

Effects of Spatial Heterogeneity on Spectral Features
We performed PCA using the full MIR spectral range of the sample set to determine the influence of climate and land use on the spectral response of bulk soils. The first component explained 69% of the variance, whereas the second and third components accounted for 17 and 5%, respectively. In general, the sample score distribution shows that the variation in the spectral signatures of soils was associated with differences in the aridity index (AI) and land use types (Figure 1). In the PCA analysis, samples from arid climates were separated from soils under more humid conditions ( Figure 1A). For land use types, only woodland soils had a noticeably different pattern from other land uses, which tended to overlap with each other ( Figure 1B). Furthermore, the sample score distribution shows that the variation in the spectral features was also associated with differences in C content. Samples were separated from soils that exhibited low and high SOM C content, however, some samples with contrasting C content also exhibited similar spectral composition (Supplementary Figure S2).
To further explore these differences, we used the PCA loadings and score distribution to reduce the dimensionality of the spectral data, as well as to identify the spectral features contributing to the maximum variance within the sampling set. The loading plot ( Figure 1C) showed that various spectral bands from organic compounds had positive loadings in PC1. The spectral differences were associated with higher absorbances in the 3,000-1,500cm −1 region, attributed to the presence of C-H stretching vibrations and aromatic groups near 1,600-1,570cm −1 and in the 2,900-2,940 cm −1 region. A pronounced band characterized the second loading in the 1,200-1,042cm −1 corresponding to C-O stretching of polysaccharides. It should be noted that the band of polysaccharides may overlap with mineral compound bands at 2000-1800cm −1 and 1,030cm −1 , which can also be attributed to Si-O groups of quartz and clay minerals, respectively. Furthermore, we found positive loading bands at 2520 -1 cm and 3,620cm −1 corresponding to carbonates and stretching vibrations of OH groups in lattice clay minerals, respectively.

Relating Soil Spectral Signatures to Environmental Variables
The relationship among soil properties, land use type and climate, along with a selected subset of spectral bands, were identified using redundancy analysis (RDA) (Figure 2). The bands in the RDA were selected in order of the highest loading per component ( Figure 1A). Bare land and shrubland were excluded from this analysis due to the small number of samples within these categories. The results show that C accumulation in the MAOM fraction (f MAOM) was particularly high under agricultural conditions with inherently low SOM C (Figure 2, Supplementary Table S1). Furthermore, clay was found to significantly increase the C storage in the MAOM fraction ( Figure 2, Supplementary Figure S3). These soil features were associated with the absorption bands at 2,510cm −1 and 3,624cm −1 (Figure 2), corresponding to carbonate and clay minerals, respectively. In more humid conditions, woodland and grassland were positively correlated with SOM C, sand, and C and N content in the POM fraction. In addition, these conditions show a substantial increase in various functional groups attributed to aliphatic C (3,000-2,800cm −1 ), carboxyl C O (1700-1800cm −1 ), aromatic C (1,600-1,500cm −1 ) and amide N-H (1,550-1,500cm −1 ).

Spectral Variations of C in SOM, MAOM, and POM
In large and complex datasets, the exploration of spectral variations associated with local conditions might be used to identify uninformative MIR regions. In this way, redundancy and collinearity are reduced so as to increase the model accuracy.
Smoothed heatmaps show that the Pearson's correlation  Figure S4). Heatmaps illustrate that overall, the position and the magnitude of correlation between spectral bands and the C content in SOM, MAOM and POM differed among woodland, grassland and cropland ( Figure 3). For example, POM fraction heatmaps showed a stronger correlation in the 2000-1,000cm −1 region for grassland and woodland ( Figure 3C). In contrast, this region was almost indiscernible for cropland. In addition, for woodland, evident differences were observed in the organic matter and mineral absorbances in MAOM fraction, especially in MIR region (2000-1,000cm −1 ). Furthermore, correlation analysis also shows other spectral patterns that are related to soil organic or inorganic C pools (Supplementary Figure S4). We found high correlations between C in SOM, MAOM and POM, and aliphatic bands around 2,850-2,920cm −1 for grassland and woodland, but less so for cropland. In addition, woodland exhibited a low correlation with the Si−O groups in quartz near 1800cm −1 for SOM C and MAOM C (Supplementary Figure S4). For inorganic C, the band at 2,525-2,510cm −1 had a strong correlation with CaCO 3 content in the grassland and cropland, but less so in the woodlands. The lattice clay absorbance bands between 3,650 and 3,595cm −1 were positively correlated to clay content as expected, but slightly negatively correlated to organic C and POM C in soils from every land use type.

Prediction of C and N Content in MAOM and POM Fractions
The performance ability of calibration models is shown in Table 2. Overall, low values of RMSE and high R 2 indicate better predictive performance, however, RPD is used to standardize RMSE values against the SD, as RMSE is a dependent concentration parameter. For C pools, the best PLS-MIR model was developed within the 2000-400cm −1 spectral range, which included the major absorption bands related to C pools previously identified in Figure 3. For C in total SOM and MAOM ( Figure 4A; Figure 4B), good and reliable estimates were obtained using a global calibration (R 2 0.80-0.79; RPD 2.23-2.17, respectively). However, the crossvalidation for C in SOM tended to underestimate the observed concentrations, thus a large number of data points were distributed below the 1:1 line showing a high dispersion at C content over 50gCkg −1 soil ( Figure 4A). In contrast, poor prediction accuracy was obtained for POM C showing a curvilinear trend (R 2 0.61; RPD 1.60). The model had a clear tendency to underestimate the prediction when the measured C content was higher than 20gCkg −1 soil ( Figure 4C).
For N pools, the use of the full MIR range demonstrated a better performance than models constructed using a wavelength selection ( Table 2). Although the prediction of N in total SOM and MAOM was less satisfactory than for C, good results were also achieved for these variables (R 2 0.64-0.51; RPD 1.88-2.08, respectively) ( Figures 5A,B). Regarding POM N ( Figure 5C), a poor prediction was obtained and insufficient for most applications, showing similar behavior to POM-C (R 2 0.47, RPD 1.37).

Prediction of Clay, pH, Soil C/N Ratio and Sand
The distribution of C and N in MAOM and POM fractions is highly controlled by predictors such as C/N ratio, pH, clay and sand ( Figure 2). We compared the PLS model performance for the aforementioned soil properties (Table 2; Figure 6). Predictions for soil C/N ( Figure 6A) were not improved (R 2 0.67; RPD 1.75) when compared with the individual predictions for C and N in SOM. For pH ( Figure 6B), the best PLS-MIR model was developed using the 400-4,000cm −1 spectral range, which showed a good prediction performance (R 2 0.76; RPD 2.06). Regarding the soil texture variables ( Figures 6C,D), similar accuracies and excellent estimates were achieved for both clay (R 2 0.81; RPD 2.28) and sand (R 2 0.83; RPD 2.41).

DISCUSSION
Large spectral libraries demand rapid and inexpensive methods such as infrared spectroscopy to screen and determine soil C stocks and soil properties. However, the wide range of soil-forming factors that influence soil spectroscopic characteristics may decrease the predictive ability of models when they are used to estimate soil properties at a continental areas. We explored the relationship between MIR spectral features and C pools and evaluated the ability of MIR spectroscopy to predict soil C content and variables associated with its dynamic, at continental scale.

Linking MIR Spectral Information to Soil C Pool
Understanding and capturing the spatial variability of soil spectral signatures at regional to continental scales is relevant to developing a robust global calibration strategy and to subsequently improving the predictive ability of soil C models. Our results showed inherent soil spectral heterogeneity associated with climate and land use (Figure 1). Likewise, climate and land use are widely accepted as the most significant drivers of C content when we evaluate their distribution at large geographical scale (Rial et al., 2017;Ogle et al., 2019). Our data show that soils under drier conditions tend to have a relatively high soil pH (Figure 2), which is not surprising given the existence of calcareous materials characterized by the presence of carbonates at the 2,513cm −1 band, concurrent with FIGURE 3 | Heatmaps displaying the magnitude of the Pearson's correlation coefficients (r) between MIR wavelength range and carbon (g Ckg −1 soil) in soil organic matter (SOM), mineral-associated organic matter (MAOM) and particulate organic matter (POM) for cropland, grassland and woodland separately. The color scale next to the maps represents the correlation coefficient r.
Frontiers in Environmental Science | www.frontiersin.org May 2021 | Volume 9 | Article 634472 7 low organic matter content (Asgari et al., 2020). In semiarid environments, the decline in primary productivity and increase in litter decomposition due to higher temperatures and intermittent precipitation has resulted in less C storage when compared to colder and more humid conditions (Guo and Gifford, 2002). In fact, we observed low POM C content in agricultural systems under semiarid conditions (Figure 2). In addition, we observed a higher proportion of C in the MAOM fraction (f MAOM) and clay content. The latter responses are consistent with the correlation observed for the clay band at 3,620cm −1 , as well as the presence of organic matter absorbance around the 1,100cm −1 band of polysaccharides (Rasmussen et al., 2018).
MIR spectroscopy has been used for qualitative analysis of various soil properties because it contains extensive information 2 | Results obtained for the mid infrared spectroscopy coupled with the partial least square model regression selecting the best pre-treatment (Savitzky-Golay first derivative, 3-point smoothing, 2-polynomial). The performance of PLS models is reported in terms of root mean square error (RMSE), coefficient of determination for calibration (R 2 ), slope of regression model and the ratio of prediction to deviation (RPD). SOM C, organic C content (g Ckg −1 soil) in soil organic matter (SOM); MAOM C, organic C content (g Ckg −1 soil) in mineral-associated organic matter (MAOM) fraction; POM C, organic C content (gCkg −1 soil) in particulate organic matter (POM) fraction; TN, total nitrogen (gNkg −1 soil); MAOM N, N content(gNkg −1 soil) in MAOM fraction; POM N, N content (g Nkg −1 soil) in POM content; C/N, soil carbon to nitrogen ratio; clay: content in percentage (%); sand: content in percentage (%).

Soil property
FIGURE 4 | Scatter plots for predictions of (A) carbon in total soil organic matter (SOM), (B) carbon in mineral-associated organic matter (MAOM); and (C) carbon in particulate organic matter (POM) resulting from continental-scale calibrations using mid infrared spectroscopy and partial least square regression. The model calibration is denoted in blue circles and predictions on the calibration set during cross-validation is denoted in oranges circles. The 1:1 line is shown for reference in each plot.
Frontiers in Environmental Science | www.frontiersin.org May 2021 | Volume 9 | Article 634472 8 on the functional moieties of organic matter (Ramírez et al., 2020;Dudek et al., 2021). Additionally, it is a useful tool to explore spectral patterns associated with soil C stocks (Stumpe et al., 2011). For example, as in many previous studies, we also found aliphatic CH spectral bands at 2,926cm −1 (Figure 2), which could be used to rapidly estimate soil C, and N concentrations of northern cold region soils (Matamala et al., 2017). Likewise, we also found distinct bands in the region between 1,510 and 1,230cm −1 , which have already been reported in association with the decomposition of forest litter in soils (Haberhauer et al., 1998). It should be noted, however, that band assignments are not absolute but rather are an approximation in the sense that a band for a particular compound is assigned at a specific wavenumber.
The MIR spectral bands can be used as valuable data to monitor major chemical changes from different land use systems. It is interesting to note that although MIR is not a high-resolution tool, the information provided by MIR spectroscopy was able to elucidate spectral patterns across different C pools. For example, the spectral patterns between POM and MAOM involved variation in absorbances between the 2,800 and 1,000cm −1 region (Figure 3). This spectral region is characterized by silicates (Si-O vibrations) and different C functional groups, including aliphatic C-H, carbonyls, aromatic C and C-H stretching typically found in carbohydrates (Parikh et al., 2014). This analysis revealed that POM displays high spectral similarities to SOM C in woodland systems, while cropland and grassland did not. Furthermore, in the cropland systems, we found similar spectral patterns between SOM and MAOM, as well as a substantial decrease of organic C absorbances in the POM fraction. This is consistent with the relative distribution of C (f MAOM) between these fractions (Supplementary Table S1) with a higher amount of C in POM in woodland soils with overall higher C content and lower disturbance as compared to grassland and cropland soils (Lugato et al., 2021).

Prediction of Soil C and N Pools and Other Soil Properties
We achieved accurate global predictions using MIR coupled with PLS spectroscopy for C and N pools at a continental scale. In the absence of some of the wavelengths containing unrelated information, the wavelengths 2000-400cm −1 had a substantial influence on the PLS model for the quantification of C in SOM and MAOM. As often reported for soil C, the multiple and small bands around 830-400cm −1 , as well as the fingerprint region (1,500cm −1 -650cm −1 ) contribute to improve the predictive ability of PLS models (Cécillon et al., 2012;Knox et al., 2015). Our results showed that MIR produced accurate predictions to determine C and N in SOM and MAOM ( Figure 4). Typically, the prediction accuracies reported for total soil N are similar to the accuracies of the SOM C models. This is mainly because most of the N is bound in the SOM, and thus, soil N is often tightly correlated to total soil C (Schirrmann et al., 2013). C and N in POM were less accurately predicted than other C or N pools ( Figures 4C, 5C). For example, our prediction yielded a low accuracy for POM C (R 2 0.61), indicating that the PLS calibration could not capture the variability over a wide range of 0.46-54.9gCkg −1 . Although previous studies have reported satisfactory accuracies for C prediction in the POM fraction using MIR spectroscopy (R 2 0.71-0.83), they were carried out in the 0.2-16.8gCkg −1 range of concentration (Janik et al., 2007;Baldock et al., 2018). However, recent studies using heterogeneous soil samples at a country level similarly reported that SOM C and MAOM C were better predicted than POM C (Sanderman et al., 2020). These results can be FIGURE 5 | Scatter plots for predictions of (A) total soil nitrogen (TN), (B) nitrogen content in mineral-associate organic matter (MAOM N) and (C) nitrogen content in particulate organic matter (POM N) resulting from global calibrations from a large-scale dataset using mid infrared spectroscopy and partial least square regression. The model calibration is denoted in red circles and predictions on the calibration set during cross-validation is denoted in yellow circles. attributed to the fact that POM is more heterogeneous than MAOM when isolated by size, since it includes both a light fraction not associated with minerals, as well as heavy coarse organic matter associated with sand-size minerals (Lavallee et al., 2020). Therefore, the spectra are subject to a high degree of variability due to the various decomposition stages of plant residues, and to the range of contribution of compounds of microbial origin which adversely impact the calibration accuracy. Furthermore, the light component of POM mostly consists of partially decomposed plant material, so differences in the vegetation cover (woody plant cover vs grasses, for example) affect the chemical character of POM (Soucémarianadin et al., 2019).
We found that C estimations in SOM were more reliable than those provided by studies that used NIR spectroscopy in the same large-scale European dataset (Nocita et al., 2014;Rial et al., 2017;Ward et al., 2019). For example, the selection of sample subsets by k-Means clustering achieved less accurate predictions (R 2 0.67; RPD 1.74) (Ward et al., 2019). In particular, MIR spectra contain fundamental bands for several organic and mineral moieties, while soil vis-NIR ranges are largely non-specific, consisting of weak, broad and overlapping absorption bands (Reeves, 2010). Furthermore, we found that MIR was highly sensitive to soil mineral composition, and appeared well equipped to determine pH and soil texture, such as clay and sand using PLS models ( Figure 6). It is important to considerer we used a size FIGURE 6 | Scatter plots for predictions of (A) soil C/N ratio, (B) pH, (C) clay and (D) sand resulting from global calibrations from a large-scale dataset using MIR spectroscopy and PLS regression. The model calibration is denoted in green circles and predictions on the calibration set during cross-validation is denoted in blue circles. The 1:1 line is shown for reference in each plot.
Frontiers in Environmental Science | www.frontiersin.org May 2021 | Volume 9 | Article 634472 separation of POM and MAOM. Therefore, as mentioned above, the organic matter associated to sand-size particles was considered POM, and could have contributed proportionally more to this fraction in coarse-textured soils. However, despite this limitation, we suggest that the influence of sand content was probably low for the prediction of C and N in POM. MIR spectra present greater signatures in quartz and clay minerals, which provide more robust calibrations to predict clay and sand content (Viscarra Rossel et al., 2006;Reeves, 2010). Therefore, in the presence of quartz-rich sand content, we clearly would expect more accurate predictions for POM than those achieved in this study. Another limitation of our study was the negative predictions obtained for low concentration values in some variables of interest. For example, negative predicted values were obtained for measured C content below 6.5gCkg −1 soil, due to the wide range of concentrations used in the analysis (2.6-110.8gCkg −1 soil). Several authors have been able to overcome this limitation, improving their predictions by using machine learning based non-linear regression methods, covariate adjustment (i.e., sand content) and by dividing the global library into small homogeneous datasets (Nocita et al., 2014;Rial et al., 2017;Ward et al., 2019).

CONCLUSION
MIR spectra coupled with PLS regression is a promising and reliable tool for predicting soil organic C, total N, C and N in MAOM, as well as the critical factors controlling their accumulation. The present study is a first attempt to estimate C and N content in MAOM and POM fractions using MIR spectroscopy in a heterogeneous set of soil samples at a continental scale. A wavelength selection in the 2000 to 1,000cm −1 range substantially influences the C estimate accuracy in bulk soils and the MAOM fraction, while the estimates of C and N in POM need to be further studied.
Besides the ability of MIR to predict different soil properties, MIR spectroscopy was able to differentiate spectral features associated with SOM pools in response to changes in land use and climate. This approach can enable large scale monitoring of soil C changes as a cheaper and higher throughput alternative to stock quantification methods that are more accurate but also more expensive and involve laborious sample preparation. Considering the effectiveness of MIR spectroscopy in resolving variations in organic and mineralogical composition, future work still is needed on an extensive MIR spectral database to capture the spatial variability of soil properties at different geographical scales. This is critical for the comprehension of environmental interrelationships affecting C distribution and increasing the robustness of predictions.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
PBR, MFC and FJC developed the research concepts, interpreted the data and wrote the paper. PBR conducted the data and statistical analyses. MH conducted all the soil organic matter fractionation and contributed to writing of the methods. EL contributed the LUCAS dataset and provided edits and comments to the paper.