Commentary: Evaluating Risk and Possible Adaptations to Climate Change Under a Socio-Ecological System Approach

Maize yields are highly dependent on meteorological conditions (Ray et al., 2015), and climate change could lead to a significant reduction in yields, especially at tropical latitudes (Rosenzweig et al., 2014). This is relevant in Mexico, where maize is by far the most widespread cultivated crop, grown extensively by small-scale farmers with little-to-no access to technology, insurance, or financial services (SAGARPA-FAO, 2014). The study by Haro et al. (2021; hereafter Haro21) provides a much-needed assessment of the socioecological risks facing rainfed maize cultivation in Mexico due to climate change. However, given the spatiaotemporal structure of maize yields in Mexico and their machine learning modeling framework, additional justification and more robust validations are needed to substantiate Haro21’s findings.


INTRODUCTION
Maize yields are highly dependent on meteorological conditions (Ray et al., 2015), and climate change could lead to a significant reduction in yields, especially at tropical latitudes (Rosenzweig et al., 2014). This is relevant in Mexico, where maize is by far the most widespread cultivated crop, grown extensively by small-scale farmers with little-to-no access to technology, insurance, or financial services (SAGARPA-FAO, 2014). The study by Haro et al. (2021; hereafter Haro21) provides a much-needed assessment of the socioecological risks facing rainfed maize cultivation in Mexico due to climate change. However, given the spatiaotemporal structure of maize yields in Mexico and their machine learning modeling framework, additional justification and more robust validations are needed to substantiate Haro21's findings.

VALIDATING A MAIZE YIELD MODEL: BEYOND R 2
Mexico is a very diverse country in social, economic, and ecoclimatic terms. These factors give way to large geographic variability in rainfed maize yields, wherein some municipalities exceed 8 ton/ha, while others remain below 0.5 ton/ha (Figure 1). To deal with the complex problem of modeling maize yields in Mexico, Haro21 trained a random forest (RF) tree-ensemble using socioeconomic and climatic variables during the 2003-2018 period. They cross-validated their model by randomly splitting the dataset into training (70% of the data) and a testing (remaining 30%) datasets. Their model achieved R 2 ≈ 0.65, which was interpreted as a robust criterion for validation. Nonetheless, this R 2 value is insufficient to validate the model for projections under climate change because most of the variability in yields occurs geographically, a dimension that is highly dependent on socioeconomic factors. More specifically, it is possible that Haro21's model achieved high R 2 through its ability to fit the data geographically, while having a poor representation of yield sensitivity to changing climatic conditions. yield at the municipality scale for rainfed maize cultivated during the Spring-Summer cycle in Mexico, as indicated by de label bar. Two graphs displaying yield timeseries are included to contrast a high-yielding municipality (Zapotlán del Rey, Jalisco) with a low-yielding municipality (Villa García, Zacatecas). Clusters of municipalities with high (>4 ton/ha), moderate (between 1.5 and 4 ton/ha), and low (<0.7 ton/ha) mean yields are enclosed by the green, blue, and red lines, respectively. With data from SIAP (2018).
To give an example, consider a "mean_model" that estimates yields at the municipality scale with the mean yield of all years available between 2003 and 2018. To cross-validate the mean_model, 70% of the data is used for training, while the remaining 30% is used for computing R 2 (only municipalities with at least 1 entry in the training set are considered). Using this framework on the same dataset considered by Haro21 for the Spring-Summer cycle (SIAP, 2018), mean_model consistently attains R 2 ≈ 0.77, despite having no temporal structure. The skill of the mean_model lies in its ability to locate municipalities, because yield variations are dominated by the spatial dimension. Clearly, large R 2 values do not guarantee a model's suitability for climate change projections.

METHODOLOGICAL LIMITATIONS FOR MAKING FUTURE CLIMATE PROJECTIONS
The need for more detailed validations is further substantiated by specific aspects in Haro21's methodology, including the use of bioclimatic variables for modeling maize yields under climate change scenarios. In this regard, climatic impacts on maize yields vary as a function of phenological stage (Tsimba et al., 2013;Sah et al., 2020), and it is unlikely that yearly aggregated variables can capture such effects throughout a country with diverse planting/harvesting dates covering two cultivation cycles (SIAP, 2018). Thus, it is unclear whether bioclimatic variables offer increased predictive ability in varying climatic conditions, or if they merely enhance the model's skill by providing environmental queues for locating a datapoint's likely region of origin.
Another relevant matter pertains to the spatial clustering evident in Figure 1, wherein contiguous regions have similar mean yields. RFs are not designed to account for such spatial autocorrelations (Hajjem et al., 2014;Santibanez et al., 2015;Hengl et al., 2018), and biases can arise due to interdependence and non-identically distributed data (Darrell et al., 2015). For instance, considering the RF training step of fitting a regression tree (Hastie et al., 2004), spatiotemporal representations within high-yielding regions might be prioritized over their low-yielding counterparts, as larger variations in the former imply greater impacts toward minimizing the sum of the square residuals (SSR). Moreover, given that yield variability in Haro21's dataset is strongly dominated by the spatial dimension, SSR minimization will primarily shape trees so they can determine yield levels by associating inputs with a-herein loosely defined-geographic cluster, with no guarantee that the model will have any temporal predictive skill.

DISCUSSION: A PATH FORWARD
Observations presented herein make clear that temporal analyses are crucial for validating a maize yield model. Indeed, the primary means to establish the model's ability to capture climatic impacts is through its accuracy for representing interannual yield variations. Many pertinent assessments exist, such as computing R 2 over anomalies-yields minus the municipality mean-, which removes the relationship between yield levels and geography (e.g., see Müller et al., 2017). In addition, pointwise (municipality-wise) linear correlations and Nash-Sutcliffe efficiencies can help identify regions where the model's performance is satisfactory and those where it is poor. Yet another alternative is to formulate the cross-validation procedure as "leave-one-(year)-out" (Thorp et al., 2007), which is commonly used in modeling applications with few years of data. Any of these temporal validations could help Haro21 address concerns about their modeling framework.
Abovementioned limitations in Haro21's methodology indicate possible areas of improvement. For example, the springsummer and autumn-winter cultivation cycles can be modeled independently, and climatic variables could be reformulated over relevant periods between planting and harvest. Haro21 could also benefit from using a machine learning framework suitable for modeling spatiotemporal data, including alternatives based on RF. The RF for spatiotemporal predictions of Hengl et al. (2018) accounts for point interrelations based on spatial proximity. Another option is to model spatial clustering as a random effect via the mixed-effects RF of Hajjem et al. (2014), which has shown improvements over traditional RFs in recent crop modeling applications (Sahajpal et al., 2020a,b).
This commentary does not pertain to Haro21 as a whole. Their socioecological systems approach holds much promise, and their goals are highly pertinent to the risks facing agriculture in Mexico under climate change. But additional justification and robust validations are needed to substantiate the suitability of their maize yield model for future climate projections.

AUTHOR CONTRIBUTIONS
The author confirms being the sole contributor of this work and has approved it for publication.