Grand Challenges in Pedometrics-AI Research

soil-environmental relations of high-orders of dimensionality and complex non-linear interactions among input and output variables. Challenges remain to enhance the transparency and


FOUNDATIONS OF PEDOMETRICS AND DIGITAL SOIL MAPPING
The discipline of pedometrics combines pedology (i.e., understanding of the physical, chemical, and biological soil properties, patterns, and their genesis) and quantitative modeling of soils. Pedometrics research has focused extensively to model soil properties and associated uncertainties from field to large landscape scale (1,2). Artificial intelligence (AI), specifically machine learning (ML), and deep learning (DL) algorithms, have advanced a profound transformation of the discipline with new challenges.
The conceptual frameworks underlying pedometrics and digital soil mapping (DSM) have been rooted in factorial models that relate soils and factors that influence soil formation, socalled soil-environmental covariates (3,4). These soil factorial frameworks have moved from the conceptual CLORPT soil formation model 1 (5,6), the spatially and temporally explicit SCORPAN framework 2 (7) toward the spatially and temporally explicit STEP-AWBH model frame 3 (8)(9)(10)(11). The general approach of soil-factorial modeling using STEP-AWBH input variables to predict a soil property or class is showcased in Figure 1. This latter mental frame accounts for soil-landscape conditions (STEP factors iii ) and the dynamics of the atmosphere/climate (A), water/hydrosphere (W), biosphere (B), as well as human activities (H) in the social, cultural, economic, and political domains (e.g., land management, carbon credit markets, economic incentives and programs, human resource capital). For example, the STEP-AWBH frame facilitates the incorporation of short-term temporally varying AWBH factors such as short-duration climatic variables (e.g., rainfall-runoff events preceding soil observations) and also long-term climatic patterns and variations (e.g., 40-year average annual precipitation and the 40-year amplitude of temperature variation preceding soil observations) that have impacted pedogenesis in a study region. The STEP-AWBH frame is anchored in system theory that views the totality of an ecosystem integrating a multiplicity of domains. Thus, STEP-AWBH has moved factorial models closer to mechanistic Earth simulation models through the incorporation of pedological, biogeochemical, socio-cultural, economic, and political factors in the modeling process of soils. In general, conceptual factorial soil models have been implemented using purely spatial quantitative approaches (e.g., geostatistical methods), environmental correlation approaches (e.g., fitting methods such as multivariate regression, ML, and DL), and hybrid methods (e.g., regression kriging) (12)(13)(14). Purely mechanistic space-time simulations of soil genesis are still at the frontiers of pedometrics due to the following challenges (1) labor and costs to collect soil data and up-to-date soil-environmental datasets, (2) algorithms that appropriately model the pedosphere across spatial and temporal scales, (3) the subjectivity and idiosyncrasy of human's impact onto landscapes that varies widely due to needs, people's values and beliefs, and (4) insufficient incorporation of social, cultural, economic, and political dimensions into the modeling process. Although factorial soil models intended to provide a mechanistic framework for soil formation they have been used predominantly as functional fit models which formalize relationships between soil-forming factors and the resulting soils.

Knowledge-Based Soil-Factorial Models
Factorial soil prediction models have dominated the field of pedometrics to assess soil quality, security, health, fertility, productivity, and more (15,16). From a pedological perspective, these kinds of models are one-directional and aim to predict soils (S) across space: Soil-forming factors → S. Thus, such CLORPT inspired soil prediction models are deterministic assuming causality between soil-environmental covariates (cause) and soils (effect). The underlying philosophical paradigms are (1) constructivism focused on the integration of new knowledge and understanding of soils that are part of the totality of the environment, and (2) participatory epistemology which asserts that meaning arises through the participation of humans with the environment. For example, for a soil scientist meaning arises through the study of soils, for a farmer meaning arises through cropping and sustainability of soils, for an environmentalist meaning arises through care and protection of soils. The constructivist and participatory paradigms view humans and all life forms as agents that participate in the formation and use of soils (B and H → S); and vice versa, soils shape land use, impact water flux, influence climatic conditions, etc. (S → SCORPAN; STEP-AWBH). Such knowledge-oriented view acknowledges feedback loops between soil-forming factors and S adopting an integrative system perspective that honors meaning, connectivity, and understanding of soil-people relations (17).
Challenges remain in knowledge-based soil-factorial models to populate SCORPAN and STEP-AWBH factors, especially B and H factors. A vast amount of DSM applications have populated the B factor and H factor using land cover, land use, or spectral signatures derived from proximal sensors or remote sensors [e.g., (12,14,(18)(19)(20)(21)]. These soil prediction models fall short to give voice to the diversity of anthropogenic impacts onto soils and reduce the H factor to a quantifiable variable in the environmental system; or worse, neglect to populate the H factor. Such commodification of B and H factors ignores how socio-cultural spheres (e.g., land use management, people's beliefs and valuation of nature and soils) and economic-political domains (e.g., conservation programs, cash crop markets, and environmental regulations) interact with soilscapes.

Data-Driven Soil Factorial Models
The adoption of AI-ML and AI-DL into soil science has pivoted research goals from understanding of soil formation and patterns toward the search for "the best" performing soil prediction model. Literally, "running the machine" to identify the ideal relations between SCORPAN or STEP-AWBH variables and a target soil property (S). Recently, Padarian et al. (22) provided a comprehensive review of digital soil models using DL and Khaledian and Miller (23) reviewed ML methods for predictive soil mapping.
Generally, prediction refers to the prediction of a future state, although factorial soil models are unsuitable to forecast or backcast because pedological understanding conveys that the relations between soil-forming factors and S change over time (24). Thus, factorial models have been predominantly used in digital soil mapping to compute spatially explicit predictions of soil properties. For example, geospatial soil predictions for soil organic carbon, bulk density, cation exchange capacity, pH, soil texture fractions, and depth to bedrock were made by Hengl et al. (25) using Random Forest, Gradient Boosting, and neural networks. Padarian et al. (22) used Convolutional Neural Networks and Cubist to predict soil layers and soil organic carbon, while (26) used various ML algorithms (among them Random Forest, Bagged Regression Tree, Boosted Regression Tree, Support Vector Machine) to predict soil total carbon and multiple carbon fractions.
The soil factorial modeling approach is not suitable to project soil properties into the future because this would involve extrapolation of soil predications that are associated with high uncertainty. Thus, there are limitations of factorial soil modeling to address Grande challenge such as the assessment of future soil and food security. Instead, state-of-the-art factorial models are now widely used in the pedometrics community to compute relations between soil-forming factors and S and apply these machine-fitting models to unsampled locations within a given study region.  (30) computed soil organic carbon using various AI machine-fitting algorithms with Random Forest being one of the most popular method in soil science. AI-ML and AI-DL technology is poised to optimize through brute fitting of inputs and outputs (31).
The implications of AI-based factorial soil modeling are striking. AI models aim to compute the "perfect" connections between soil-forming factors and S to describe the soilecosystem adopting a purely empirical data-driven perspective. Computational advancements have fostered to apply AI-DL algorithms, which refers to ML using multiple layers, nodes, and weighting factors of adjustable computing elements (32). Artificial neural networks, such as Convolutional Neural Networks (CNN), adopt the DL paradigm to fit inputs (soilforming factors) and output (a specific soil property or class) accounting for the complexity of real-world soil-ecosystems (e.g., CNN-DSM application by 28). According to Liao (33), "deep learning announces its prediction without explaining (in human terms) how it arrived at that prediction" (p. 7). AI-DL soil models idealize the R 2 of 1.0 and are able to minimize error metrics like no other statistical or geostatistical method. AI-DL models use nodes or layers of nodes to represent pedological knowledge in abstract form that lacks transparency for soil scientists, land managers, and the general public.
Importantly, this movement from knowledge-based toward data-driven AI-based soil-factorial modeling is a profound paradigm shift from earlier research focused on understanding of the humans-soil-environmental domain toward soils mapped into machine code. AI-based soil modeling relaxes deterministic-mechanistic assumptions of causality between soil-environmental covariates (cause) and soils (effect); instead, associations between covariates and soil(s) are optimized to identify the best performing model(s). The latter associations may be spurious, lack transparency, human meaning-making and trust in the model, and have vulnerable sensibility to socalled one pixel-attacks (33). By changing one pixel in an image, Harvard researchers were able to get a DL algorithm to classify an image of a car as a dog (33). How are pixel-attacks impacting soil maps and models? What are the implications of adding/changing one soil observation or a covariate (e.g., one pixel in a remote sensing image) in an AI-DL soil prediction model?
What Is "the Perfect" Soil Model of the Future?
The pivotal shift from pedological knowledge-discovery of soil-landscape patterns and genesis toward machine optimized soil models has created enthusiasm, critique, and controversy in the pedometrics research community. Wadoux et al. (34) demonstrated that ML can find relevant soil patterns even with meaningless pseudo-variables, such as digital portrait photos of pedometricians, that successfully predicted soil organic carbon using the machine learning algorithm Random Forest.
The prevalent trends in SCORPAN and STEP-AWBH facilitated modeling of soils entail increased usage of (1) latent variable and AI-ML methods, specifically the popular Random Forest approach [e.g., (35,36)], (2) hidden nodes, layers, and weighting factors in AI-DL methods [e.g., (29,37)], and (3) applications of automated mapping techniques at global scale (25,38) to identify the best fit between soil, soil-environmental covariates and spectral data, minimize uncertainty and bias, and optimize the accuracy and precision of soil predictions. Such machine-focused research applications reveal "the perfect" soil model in terms of error and uncertainty metrics. However, these kind of machine-generated soil models are rooted in abstract soil-environmental relations of high-orders of dimensionality and complex non-linear interactions among input and output variables. Challenges remain to enhance the transparency and meaning of AI soil models. In AI, black-box (AI-DL) or graybox (AI-ML) conceptual frameworks are adopted, the former AI-DL limiting insights into the soil-environmental relations that govern the soil prediction model. In contrast, "the perfect" soil models using non-AI approaches aim to maximize soil knowledge discovery (e.g., assessing soil carbon or identifying which STEP-AWBH variable relates most strongly with a specific S) which is meaningful to people and researchers alike.
The challenge for future soil-factorial model applications is to account for-prior selection of relevant soil-environmental covariates (i.e., SCORPAN or STEP-AWBH factors) based on pedological knowledge as well as rigorous posterior soil model interpretation. Xiong et al. (11) provided a holistic soil-landscape modeling framework that combines knowledge and data-driven AI approaches to assess soil organic carbon in Florida. Xiong et al.'s strategic approach is able to discern (1) all-relevant sets (i.e., strongly and weakly relevant STEP-AWBH variables to assess S which has value to understand the mechanisms underlying the formation of S), (2) minimal-optimal set (i.e., to identify a parsimonious and transparent model for end-users), and (3) irrelevant variables (e.g., pseudo-variables that are not meaningful to explain pedological processes). Challenges remain to harmonize and reconcile knowledge-based and data-driven AI soil modeling approaches to account for the diversity of soillandscapes.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.