Spatial Pattern and Environmental Drivers of Acid Phosphatase Activity in Europe

Acid phosphatase produced by plants and microbes plays a fundamental role in the recycling of soil phosphorus (P). A quantification of the spatial variation in potential acid phosphatase activity (AP) on large spatial scales and its drivers can help to reduce the uncertainty in our understanding of bio-availability of soil P. We applied two machine-learning methods (Random forests and back-propagation artificial networks) to simulate the spatial patterns of AP across Europe by scaling up 126 site observations of potential AP activity from field samples measured in the laboratory, using 12 environmental drivers as predictors. The back-propagation artificial network (BPN) method explained 58% of AP variability, more than the regression tree model (49%). In addition, BPN was able to identify the gradients in AP along three transects in Europe. Partial correlation analysis revealed that soil nutrients (total nitrogen, total P, and labile organic P) and climatic controls (annual precipitation, mean annual temperature, and temperature amplitude) were the dominant factors influencing AP variations in space. Higher AP occurred in regions with higher mean annual temperature, precipitation and higher soil total nitrogen. Soil TP and Po were non-monotonically correlated with modeled AP for Europe, indicating diffident strategies of P utilization by biomes in arid and humid area. This study helps to separate the influences of each factor on AP production and to reduce the uncertainty in estimating soil P availability. The BPN model trained with European data, however, could not produce a robust global map of AP due to the lack of representative measurements of AP for tropical regions. Filling this data gap will help us to understand the physiological basis of P-use strategies in natural soils.

where N is the number of depths, * is the k-th depth and * is the OC at depth * .
Climatic information (MAT, AMP and MAP) was obtained from WorldClim datasets (version 2.0; http://worldclim.org/version2), which was produced by interpolation from weather stations combined with satellite datasets (Fick & Hijmans, 2017 ( Table S1). Both of those datasets have a spatial resolution of 1km and was resample into 10km before the extrapolation.

S2. Gap filling and pre-processes of observation data sets and predictors
We used total C content instead of sites with soil pH <6 for sites with no OC data, because inorganic C is largely in carbonate forms that do not occur in acid soils (Nelson & Sommers, 1996). The Environmental data often have skewed distributions, such as soil nutrient concentrations (Blackwood, 1992). Fitting lognormal distributions is commonly used to represent this kind of data in statistical analysis (Blackwood, 1992). We tested the distribution of 296 measurements of AP and of the 10 numerical predictors for all global pixels ( Figure S2). The acid phosphatase activity was found to follow a lognormal distribution, so that AP was log-transformed. The content of soil nutrients (OC, TN, TP and Po) are also log-normally distributed. Additionally, NPP, AMP and MAP also follow lognormal distribution. Therefore, all these predictors were also log-transformed.

S3. Optimal number of neurons for BPN
The performances of a BPN is partly determined by the number of neurons in each hidden layer. To decide the optimal number of neurons, we used exhaustive combinations of 1-20 neurons for two hidden layers to evaluate the performance of BPN models with respect to number of neurons. The BPN models were trained 1000 times using different training-testing subsets (85% for training datasets, 15% for test datasets). We found that the lowest root mean square error (RMSE) for test datasets occurs in BPN framework with about 10 neurons for hidden layer 1 and 5 neurons for hidden layer 2 ( Figure S3)

S4. Detection of outliers
There is a very low accuracy of 19% of explained variance of AP over all measurements, accompanied by a very high root mean square error (RMSE) of 18.2 µmol g -1 h -1 (Table 2). 9 outliers (Site A-I) with extreme bias (absolute bias >20 µmol g -1 h -1 and the relative bias >50%, Figure Table S3.

S5. Compare the performance of the BPN on tropical sites and temperate sites outside Europe
To access if the AP extrapolation for global region is acceptable, we compared the performance of the BPN on tropical sites and temperate sites outside Europe. The predicted AP were generated (1) by reproducing the AP for temperate and tropical sites with complete information using and (2) by extracting from the global pattern of predicted AP according to the coordinates of the measurement sites ( Figure S12). Note that TN was detected to be largely affect AP pattern (Figure 8) but have a low accuracy for extracted TN from global gridded dataset ISRIC-WISE compared with the values reported by literature ( Figure S1), we only reproduced the sites with original TN for (1).

S6. Understanding the failure to extrapolate AP to tropical region
The relationships between AP and environmental factors reported for tropical regions differed greatly    Figure S1 The comparison of Pearson correlation and rank correlation (r) between observed predictors and extracted values from 4 different gridded metrics.

Figure S2
The distribution of 10 numerical predictors used in this analysis for global pixels.
Fraction Figure S3 The root mean square error (RMSE; µmol g -1 h -1 ) on train and test datasets by using different number of neurons in two hidden layers for back-propagation models.

Figure S4
The distribution of 9 sites with very biased estimates of AP by BPN.  Error bars show the 25%~75% quantiles of AP for same soil (or biome) type.

Figure S9
Explanation of climates (MAT, AMT, MAP) to NPP for Europe region. The regression analysis was conducted by using a spatial moving window of 4.5° ´ 4.5°, of which climates and NPP were firstly resample into half degree by using area-weighted mean methods.

Figure S11
The relationship between partial correlation coefficients of AP to TP (blue) and labile Po (red) and aridity index (P/PET). The aridity index is calculated as the ratio of mean annual precipitation and mean potential evapotranspiration across 1981-2018 that derived from Climate Research Unit (CRU) datasets (CRU TSv4.03). Blue and red lines indicate the regression between partial correlation coefficients and aridity index. Grey shading indicates where the partial correlation relationship is not significant (p>0.1). lines. The error bars indicate the 10% and 90% quantiles of predicted AP.