Machine learning algorithms improve MODIS GPP estimates in United States croplands

Results and discussion: The coef ﬁ cient of determination ( r 2 ) of the raw comparison (MODIS GPP to EC GPP) was 0.38, prior to machine learning model incorporation. The optimal model for simulating GPP across all sites was a Stacked Ensemble type with a validated r 2 value of 0.87, RMSE of 2.62 units, and MAE of 1.59. The machine learning methodology was able to successfully simulate GPP across three agroecosystems and two crops.


Introduction
The use of satellite-derived estimates of ecosystem productivity have become somewhat commonplace in ecosystem and agricultural sciences (Huang et al., 2018;Smith et al., 2019;Ai et al., 2020).Estimating plant growth has utility in a wide variety of ecological and agricultural applications, including carbon uptake estimates, yield forecasting, detection of plant pathologies, and detecting ecosystem changes (Steven, 1993;Kerr and Ostrovsky, 2003;Pettorelli et al., 2017).These estimates generally take advantage of the unique way that photosynthesizing plants reflect near infrared radiation (NIR), which can be easily detected with satellite and aerial sensors (Badgley et al., 2017;Baldocchi et al., 2020).Large networks of sites, such as the Long-Term Agroecosystem Research (LTAR) network, provide unique opportunities to analyze plant productivity across multiple collaborative sites over a long period of time, allowing for a better understanding of large-scale spatio-temporal trends.The LTAR network is a collaboration between 18 long-term agricultural research sites across the United States established by the United States Department of Agriculture (USDA) Agricultural Research Service (ARS) and collaborative land-grant universities.The overarching mission of the LTAR network is to provide sustainable solutions for food and fiber production that are currently facing challenges associated with changing climate and increasing resource demands.The LTAR network has been increasingly turning to technological solutions, including remote sensing, to serve as a large-scale indicator (Spiegal et al., 2018;Browning et al., 2021) and solve its pressing questions regarding agricultural sustainability (Kleinman et al., 2018;Boughton et al., 2021;Goodrich et al., 2021).The LTAR network includes a wide range of cropping systems, management practices, and land use histories.Studying the interactions of cropping system and management with carbon flux can be useful when determining best management practices in a variety of systems.
One commonly used satellite output is the Moderate Resolution Imaging Spectroradiometer (MODIS) Gross Primary Productivity (GPP) product.This output provides a measure of total carbon uptake via photosynthesis (GPP)-a major component of the carbon cycle in terrestrial ecosystems.MODIS is a passive sensing instrument aboard NASA's Terra and Aqua satellites that collects spectral data among 36 bands with a temporal resolution of 1-2 days.The MODIS GPP estimate is a pre-processed data product available via NASA-based platforms (Maccherone, 2021).The GPP product derived from MODIS data uses a light-use efficiency-based model that is modulated by biome type.MODIS classifies all cropland into a single cropland biome.This method relates GPP to the light-use efficiency of photosynthesizing plants and the availability of light.The method is common for estimating GPP using remote sensing from a wide range of sensors beyond MODIS (Reeves et al., 2005;Running and Zhao, 2015;Huang et al., 2021).
Remote sensing estimates of GPP, such as the MODIS GPP product, have a number of advantages compared to ground-based methods, including lower cost, ease of use, and ability to estimate GPP in regions where ground-based instruments are impractical.However, the MODIS GPP estimate is prone to underestimation due to uncertainties associated with assumptions used in the method, cloud cover, coarse resolution, and others.Including the classification of all cropland as a single biome type is largely due to the limited spatial resolution (Tuner et al., 2006;Sims et al., 2008;Huang et al., 2018).For instance, a vulnerability of the MODIS GPP product is that the default scalars used in the calculation of the maximum light-use efficiency are not well measured and are lacking a distinction between C 3 and C 4 photosynthetic pathways (Tuner et al., 2006;He et al., 2013;Xin et al., 2015;Huang et al., 2021).Moreover, reports of uncertainty are common among the photosynthetically active radiation (PAR) absorption calculations used by MODIS (He et al., 2013;Cheng et al., 2014).Many authors have succeeded in improving the GPP estimates (more in-line with ground truth data) by modifying the efficiency parameters and PAR input parameters (Sims et al., 2008;Gilabert et al., 2015;Huang et al., 2018;Huang et al., 2021).The process of improving satellite estimates of GPP requires reliable ground truth data from insitu carbon flux measurements.
The most common ground based GPP estimation method is the eddy covariance (EC) method (Novick et al., 2018;Hermes et al., 2019;Baldocchi, 2020).The EC method uses two rapidresponse (i.e., 10 Hz) instruments, an infrared gas analyzer (IRGA) that measures the concentration of the gas of interest (in this case, CO 2 ), and a sonic anemometer that measures the vertical wind speed.The covariance of the simultaneous measurements is gas flux, which in the case of CO 2 is the net ecosystem CO 2 exchange (NEE).GPP is then derived from net ecosystem exchange of CO 2 (NEE) using a variety of flux partitioning methods (Reichstein et al., 2005;Wutzler et al., 2018).This method is widely used to estimate GPP due to its continuous measurement style and accuracy; however, the method has several key drawbacks.The instruments needed for the EC method are expensive, need regular maintenance, and require large flat areas with uniform vegetation for optimal function.Despite these challenges, EC instruments provide a strong control and an in-situ estimation of GPP.EC data is widely available through collaborative research efforts, such as LTAR, and through data-sharing networks, such as AmeriFlux or FLUXNET (Pastorello et al., 2020;Bond-Lamberty, 2018).
The in-situ GPP data bridges the gap between the EC GPP estimates and the MODIS GPP products.There have been numerous successes in bridging this gap using linear regression both to modify parameters used in the MODIS algorithm and to modify GPP outputs (Wang et al., 2012;Fu et al., 2014;Xin et al., 2015).Xin et al. (2015) modified the light use efficiency term using linear regression and in-situ measurements to modify MODIS efficiency terms.Kang et al. (2005) improved MODIS outputs using a cloud correction algorithm.The gap could be more thoroughly overcome through the introduction of more advanced modeling methods.Many models have been developed using MODIS GPP and meteorological data, with machine learning algorithms becoming more common in recent years (Joiner and Yoshida, 2020;Jung et al., 2020;Yu et al., 2021).Machine learning is a method of modeling that uses data to train algorithms that describe the data and allow for the prediction of new data points.This method is increasingly being used for simulating CO 2 and other ecosystem gas fluxes (Yao et al., 2017;Knox et al., 2021;Reed et al., 2021;Shang et al., 2021;Talib et al., 2021).Yang et al. (2007) was able to improve MODIS GPP estimates using support vector machine learning.Similarly, Joiner and Yoshida (2020) estimated global GPP on a yearly time step using MODIS data and Neural Network modeling.Cui et al. (2021) used the support vector machine to improve gap filling and evapotranspiration estimates of EC data.
Here we construct a simple machine learning method by using EC data as a ground-truth (dependent) variable.Whereas the MODIS GPP alongside precipitation, temperature, crop, and agroecosystem serve as the independent variables.Using these variables and the AutoML machine learning function of the h2o package, the objectives of this study are to determine: 1) The feasibility of utilizing combined datasets across multiple LTAR sites to estimate GPP using machine learning algorithms, and 2) establish an estimation of GPP that can be used as part of a carbon balance proxy, to serve as a supporting indicator of the sustainability goals of the LTAR network.

Site and EC data selection
The LTAR network has 18 sites of which 13 are solely or partly cropland sites (Figure 1).Despite the establishment of EC towers across the network, many are new, thus limiting the amount of data collected to date.Of the 13 LTAR cropland EC agroecoregions, three had enough EC data for use in machine learning algorithms.These three sites include the Kellogg Biological Station (KBS), the Platte River High Plains Aquifer (PRHPA), and the Upper Mississippi River Basin (UMRB) sites (Figure 1; Table 1).All of these sites are part of LTAR's Common Experiment, where similar methods and management practices are used across multiple sites.
The KBS LTAR site is located near Battle Creek, Michigan, United States (42.4376, and is operated as an LTAR site   (Abraha et al., 2019;Robertson and Chen, 2021).Each EC tower is equipped with a LI-7500 IRGA (LI-COR Biosciences, Lincoln, NE, United States) and a CSAT3 sonic anemometer (Campbell Scientific, Logan, UT, United States).Air temperature and precipitation were determined with ancillary instruments on the EC tower.Data was processed using the EdiRe system (University of Edinburgh, Edinburgh, Scotland, United Kingdom).This processing included flagging low-quality data, performing corrections for sonic temperature and humidity, planar fit coordinate rotation, and corrections for air density.These are typical corrections used in the processing of Eddy Covariance data (Abraha et al., 2019;Burba, 2022).
The PRHPA site is located near Omaha, Nebraska, United States (41.1651, −96.4766) and is operated under a partnership between USDA-ARS and the University of Nebraska-Lincoln and is part of the Platte River High Plains Aquifer agroecoregion (Bean et al., 2021).The average annual temperature is 10.1 °C and the average precipitation is 790 mm.Three EC towers were established in 2001 and remain in operation.The three EC towers are in three no-till fields that are operated with the following production cycles: 1) Irrigated continuous maize; 2) irrigated maize/soybean rotation; and 3) rainfed maize/soybean (Suyker, 2021a;Suyker, 2021b;Suyker, 2021c).Irrigation managements was performed with a center-pivot.LI-7200 IRGA (LI-COR Biosciences) and R3-100 sonic anemometer (Gill Instruments, Hampshire, United Kingdom) were used at the site.On-site raw processing was completed using custom code and included typical corrections for EC data as discussed previously.
The UMRB site is located near Minneapolis-St.Paul, Minnesota, United States (44.7143, −93.0898) and is operated under a partnership between USDA-ARS and the University of Minnesota and is part of the Upper Mississippi River Basin agroecoregion (Bean et al., 2021).The average annual temperature is 6.4 °C and the average annual precipitation is 879 mm.Three of the EC towers (UMRB 1, UMRB 2, and UMRB 3) were established in 2003 and were dismantled in 2016 when the site was developed.A new EC tower (UMRB 4) was established in a nearby site in 2017 and is still in operation.All tower sites were managed as rainfed maize/soybean rotation with chisel plow tillage (Baker and Griffis, 2019;Baker and Griffis, 2021).LI-7500 IRGA and CSAT3 sonic anemometer were used at the site.Raw data were processed on-site using custom code prior to data sharing.Data processing involved standard corrections applied to EC data as previously discussed.Air temperature and precipitation were measured at all three sites (KBS, PRHPA and UMRB) with ancillary instruments on the EC towers.
REddyProc uses a moving-window-based algorithm to fill gaps in EC data and is one of the widely used methods (Reichstein et al., 2005;Wutzler et al., 2018).An average gap of 15.7% is relatively low compared to other eddy covariance datasets (Falge et al., 2001;Hui et al., 2004;Moffat et al., 2007) NEE fluxes were partitioned into GPP and R eco in REddyProc using a relationship between nighttime NEE and air temperature to estimate R eco , assuming nighttime NEE fluxes are equal to R eco .GPP was then calculated by adding R eco to NEE (Lloyd and Taylor, 1994;Reichstein et al., 2005).EC GPP will be referred to in this paper as GPP EC from here outwards.

MODIS data acquisition and processing
MODIS data were pulled from the MODIS/006/MOD17A2H collection (Running et al., 2015) through Google Earth Engine Code Editor (Gorelick et al., 2017).Quality control (QC) bits 5-7 provided a 5-level confidence quality score where "0" indicated the "very best possible" quality (e.g., absence of the clouds).Images with the value of "0" were used while the remaining scores were all masked.Once the desirable images were selected, they were saved to a list and exported to Google Drive.The imagery data were then subject to geospatial processing methods incorporating averaged zonal statistics representing the area of each individual field using ArcGIS Pro (ESRI, Redlands, CA, United States).The spatial resolution for this product is 1 km 2 .The typical eddy covariance tower has a flux footprint (radius) around 150-200 m 2 , giving it an effective spatial resolution that is similar to that of the MODIS product.Across the studied sites, the average EC flux location consisted of 1.05 MODIS pixels.

MODIS GPP algorithm
MODIS calculates GPP using a light-use efficiency-based model as follows: where APAR is the absorbed photosynthetically active radiation (PAR) and ε is the coefficient of radiation use efficiency (Reeves et al., 2005;Running and Zhao, 2015).ε is calculated using the maximum ε, and terms for water, temperature stress, and other environmental factors that come from the Biome Specific Parameters Look Up Table (BPLUT, https://www.ntsg.umt.edu/files/modis/MOD17UsersGuide2015_v3.pdf).For this dataset, cropland BPLUT was used (Running and Zhao, 2015;Huang et al., 2021).APAR is calculated by modifying incoming PAR using cloud cover, aerosol interference, leaf area, day length, and incident angle (Running and Zhao, 2015).As the MODIS product is an 8-day sum, a daily value was obtained by dividing the value by 8.

Machine learning model
The h2o package (LeDell et al., 2021) provides the AutoML function, which is an automated, supervised machine learning algorithm.It trains the model by utilizing a variety of other algorithm types such as: Gradient Boosting Machines (GBM), Generalized Learning Models (GLM), and eXtreme Gradient Boosting Models (XGBoost).AutoML uses three pre-specified XGBoost GBM models, a fixed grid of GLMs, a default Random Forest (DRF), five pre-specified H2O GBMs, a near-default Deep Neural Net, and Extremely Randomized Forest (XRT), a random grid of XGBoost GBMs, a random grid of H2O GBMs, and a random grid of Deep Neural Nets (H2O AutoML, 2022).AutoML includes stacked ensembles, which is a type of algorithm that additionally trains with a second-level meta-learner to find the best combination of the base learners (LeDell and Poirier, 2020).The models were trained to predict the daily GPP values based on the combination of MODIS-derived daily GPP (MODIS GPP), Julian day of year (DOY), air temperature, precipitation, agroecoregion, and crop.While MODIS includes biome types, the biomes used by MODIS are very broad (i.e., cropland, grassland, deciduous forest) and all sites in this study fall into the cropland biome, including location that allows for more specific ecoregions to be included.This model was limited to only maize and soybean due to limited data availability for other crop types.Several algorithm types such as Random Forest, k-nearest neighbor, and XGBoost were trained separately and as part of AutoML's stacked ensemble in R (LeDell and Poirier, 2020).For reproducibility, parameters such as "max_models" and "max_runtime" were set to 500 and unlimited, respectively.This allowed AutoML to generate 500 models for each run with no limitations on time and then select the best-performing ones overall as well as within each algorithm type.The best models were identified by Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).It should be noted that XGBoost is not currently available on Windows, which is why MacOS was used in this step.
Cross-validation (CV) is used to understand how well a model will likely perform in an actual use-case scenario (Friedman et al., 2001).There are several available methods to use for CV, each having its own advantages and disadvantages (Arlot and Celisse, 2010).h2o AutoML uses K Fold CV by default, a method introduced by Geisser (1975), to provide a better estimate of how well the model will perform on new data.K Fold CV removes part of the training data and tests the model on the removed portion.The data is divided into k subsets, and then the training is repeated k times using a different combination of subsets each time.The accuracy metrics are averaged across all versions to produce the CV results.In our case, k = 5, which means that 80% of the data was used to predict on the remaining 20%.While the CV results can give a reasonably accurate view of model performance, we chose to keep some years separate from the training for an independent model validation.Only the validation results from this last step are utilized in this paper to provide a more accurate reflection of the models' performance instead of the CV results that can sometimes appear inflated.The data were split three ways across entire years to ensure model robustness: 1) Models were trained on older years (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) and tested on recent years (2016-2019); 2) Models were trained on recent years (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) and tested on older years (2004)(2005); and 3) Models were trained and tested on years selected at random (testing years: 2009, 2011, 2014, 2017).Due to the variability of observations between years, the number of years for each region were chosen to maintain as close to an 80:20 training/testing split as possible for a more realistic view of the model's performance (see Saeb et al., 2017).A summary of all models run is included in the supplementary materials.

Statistical analyses
The statistical methods used for comparing modeled GPP to GPP EC were coefficient of determination (r 2 ; Eq. 2), Root Mean Square Error (RMSE; Eq. 3), and Mean Average Error (MAE; Eq. 4).R-squared measures the amount of variability in the predicted variable that can be explained by the model with values ranging from 0 to 1. RMSE is the sum of the square of prediction error for each observation.MAE is the sum of the absolute value of error.We used r 2 and RMSE to easily compare across models, while MAE was used for ease of communicability to the general public because it is in a format that is understood by the average farmer (i.e., gC m −2 day −1 as opposed to a root/squared value or something between 0 and 1 that may not be comparable across fields).These indices were calculated as follows: where n is the number of data points, ŷ is the actual value (GPP EC ), y i is the predicted value (modeled GPP), and y is the mean value.Linear regression of the GPP MODIS and GPP EC was performed using SigmaPlot (Version 14.0, Systat Software, Berkshire, United Kingdom).Differences in model success between management practice and sites were determined by comparing MSE and MAE values.
3 Results and discussion

Comparison of GPP MODIS to GPP EC
The linear regressions between the GPP MODIS (prior to modeling) and GPP EC vary greatly among sites (Figure 2).The strongest correlation was found at the PRPHA site with maize; however, this could possibly be an artifact of greater data availability (17 years per EC tower versus 9 and 12 at KBS and UMRB, respectively) for that site/crop combination.Larger datasets, with increased data available for training, are associated with lower error than smaller datasets (e.g., Faber et al., 2016;Schmidt et al., 2017;Zhang and Ling, 2018).Across all sites, GPP MODIS has a tendency to underestimate GPP during the peak growing season and overestimate GPP in the seedling establishment and senescence phases.Underestimation was most pronounced at the peak of the growing season, often by > 100 gC per 8-day measurement period.Overestimation was most pronounced in the late spring just at the very beginning of the growing season with overestimations of around 50-60 gC per 8-day measurement period being common.Tuner et al. (2006) reported that GPP MODIS product tended to overestimate during low productivity and underestimate during high productivity (i.e., peak growing season) across multiple biomes including croplands.Other studies also found significant underestimation of GPP EC by the raw GPP MODIS product, often with similar magnitudes to our results (Wang et al., 2012;Tang et al., 2015;Huang et al., 2018).
A best fit line between GPP EC and GPP MODIS was determined for all sites as one dataset using linear regression (Figure 3).The r 2 of the relationship with all sites pooled is not different from findings of sites being analyzed individually, potentially indicating a universal model for simulating GPP for all sites (i.e., toward a much greater

FIGURE 2
Relationship between the raw Moderate Resolution Imaging Spectroradiometer (MODIS) gross primary productivity (GPP) and eddy covariance GPP using linear regression at each site.The best fit lines use the same color scheme as the points that the regression is showing the relationship of.All slopes are significant to p < 0.0001.Platte River High Plains Aquifer (PRHPA) had 1,007 and 299, observations for irrigated and rainfed maize, respectively and 285 and 344 for irrigated and rainfed soy, respectively.Kellogg Biological Station (KBS) had 342 and 355 observations for AGR and CRP, respectively.UMRB had 303 and 349 observations for maize and soy, respectively.

FIGURE 3
Linear regression of the raw Moderate Resolution Imaging Spectroradiometer (MODIS) gross primary productivity (GPP) and eddy covariance GPP at all sites as 8-day sums (A) for maize (red solid) and soybean (blue dashed), and (B) for both maize and soybean in one dataset with the best fit line in red.All slopes are significant to p < 0.0001.The total number of datapoints was 3,030 (2,186 for maize and 844 for soy).
utility than a model that works for one site alone).There are differences in the correlation between GPP EC and GPP MODIS between the maize and soybean crops, with a stronger correlation in soybean sites.The efficiency parameter in MODIS is based on C 3 photosynthesis, which has been shown to lead to greater underestimation of GPP in C 4 -dominated systems (Wang et al., 2012;Running and Zhao, 2015;Huang et al., 2021).There were also differences between sites in the efficacy of the MODIS GPP at directly estimating crop GPP, which may be related to the coefficients used in the estimation.The coefficients (i.e., light use efficiency (ε), temperature response, vapor pressure response, etc.) used by MODIS were the same for all three sites, regardless of differences in climate (Running and Zhao, 2019).

Model results
All agroecoregion/crop combinations were analyzed first, individually, using the three data training/testing splits (Table 2).To re-iterate, all observations classified as the testing datasets were not included in the training datasets.Supervised machine learning has an unavoidable issue of overfitting, mostly due to the limits of training data or the constraints of algorithms that are too complicated and require an abundance of parameters (Ying, 2019).Training/validation data were kept separate in an effort to minimize overfitting, with only the validation results communicated here.At PRHPA, the maize models had an average validation r 2 of 0.88, RMSE of 2.82 gC m −2 day −1 , and MAE of 1.71 gC m −2 day −1 ; for soybean the average validation r 2 was 0.78, RMSE was 2.61 gC m −2 day −1 , and MAE was 1.73 gC m −2 day −1 .For both crops combined at PRHPA the model success was similar to single crop models with an average r 2 of 0.85, RMSE of 2.93 gC m −2 day −1 , and MAE of 1.77 gC m −2 day −1 .At UMRB, the maize only models had an average r 2 of 0.86, a RMSE of 3.06 gC m −2 day −1 , and a MAE of 2.0 gC m −2 day −1 ; with soybean only models the average r 2 was 0.76, RMSE was 2.06 gC m −2 day −1 , and MAE was 1.38 gC m −2 day −1 .For both crops combined at UMRB the model success was similar to that of single crop models with an average r 2 of 0.84, an RMSE of 2.6 gC m -2 day -1 , and an MAE of 1.64 gC m -2 day -1 .At KBS only maize was grown, and the models had an average validation r 2 of 0.77, RMSE of 3.14 gC m -2 day -1 , and MAE of 1.95 gC m -2 day -1 .All best-fit models were Stacked Ensemble-type models, likely due to their second-level learning approach by utilizing what they learn from the base learners to inform the meta-algorithm or super learner.
In both agroecosystems with maize and soy, the model success with both crops combined into a single dataset was similar to that of single crop models, indicating that this method can be used with multiple crops with different growth patterns and photosynthetic pathways in the same model.Model validation RMSE ranged from 2.06 to 3.14 gC m -2 day -1 depending on location and crop.Given that the typical maximum daily GPP EC was around 25-30 gC m -2 , the observed error was considerably less than a day's carbon update and was well within the range of expected values from other studies.Other remote sensing GPP modeling efforts (primarily linear regression) reported daily RMSE values ranging from 2.6 gC m -2 (Nguy- Robertson et al., 2015), 1.9 to 12.1 gC m -2 depending on efficiency term (Cheng et al., 2014), 0.5 to 2.0 gC m -2 depending on ecosystem and methodology (Gilabert et al., 2015), 3.8 gC m -2 (He et al., 2013), and 0.8 to 7.5 gC m -2 depending on site and methodology (Huang et al., 2021).The error range found here is well within the range reported by other studies, showing that this method is suitable for simulating cropland GPP.

Comparison of all site data
The models were then run with all site data combined into a single dataset (Table 3; Figure 4).When all agroecoregions and crops were combined into one dataset, the model validation r 2 was 0.85, RMSE was 2.77 gC m -2 day -1 , and MAE was 1.67 gC m -2 day -1 .As a control, a model set where the only input was MODIS GPP was also created; this model was much weaker (r 2 : 0.52), showing that the addition of other variables (i.e., climate, location, crop) greatly improved model success.The error shown with the single dataset was within the range seen in individual datasets and similar those seen in other studies (Guo et al., 2023;He et al., 2013;Nguy-Robertson et al., 2015;Reed et el., 2021).Duan et al., 2021 found a similar error using Random Forest when modeling for rice (Oryza sativa), but was more accurate when modeling wheat (Triticum Aestivum).When looking at the regression between modeled and observed GPP, as shown in Figure 4, the slope of the relationship is close to 1.0.The slopes were similar across data splits, 1.01 for early year, 1.05 for late, and 0.96 for random, indicating a near 1: 1 relationship between modeled and observed GPP.This 1: 1 relationship further indicates that Stacked Ensemble machine learning can reliably estimate GPP across various data splits.However, it is worth noting that there was a greater spread of data points about the slope at higher GPP values (both with machine learning and in the original data comparison), indicating a potential oversaturation of the remote sensing data.While the combined dataset had more training data, it had two crops and three locations to model for, potentially complicating the modeling effort, as a result, it is performance was similar.Maize and soybean have very different growth habits and photosynthetic pathways, likely making best-fit models different for each crop.The MODIS17 product uses a light-use efficiency method but does not correct for differences in C 3 and C 4 photosynthetic pathways.In-situ measurements of light use efficiency have found considerably different efficiency values for maize and soybean due to differences in plant physiology, including canopy structure and photosynthesis biopathways (Gitelson et al., 2015;Xin et al., 2015;Gitelson et al., 2018).However, by including the crop type as a variable, the AutoML algorithms successfully distinguished between the two biopathways.
As with analysis by agroecosystem, all best-fit models for the combined datasets were Stacked Ensemble-type models.As discussed previously, Stacked Ensemble is a machine learning method that combines multiple learning methods (i.e., GBM and XGBoost) by using the output of one model as the input for another (Rajadurai and Gandhi, 2020;Mohebbian et al., 2021).Stacked Ensemble is a robust approach that can work with many data types and uses (Zai and Chen, 2018;Rajadurai and Gandhi, 2020).Stacked Ensemble methods have been found to frequently outperform single models across many data types (Zhai and Chen, 2018;Chowhurdy et al., 2019;Singh et al., 2019;Jangam and Annavarapu, 2021).

Implications for future research
This study has provided insight into the potential of using machine learning methodology to estimate GPP using readily available inputs (MODIS GPP product, air temperature, precipitation, crop type, and agroecosystem) across the LTAR network and croplands.This framework has the potential to allow for network-wide estimations of carbon uptake across the Common Experiment and other sites, even where EC towers are not present, and to further network goals of understanding cropland carbon dynamics.Combined models (including multiple agroecoregions in the same model) can account for regionspecific differences by using agroecosystem region as an input in the training phase.The combined model will allow for more largescale carbon inventories without compromising on accuracy when compared to site and crop-specific models (combined model r 2 :  were used as validation and the earlier years (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) were used as training (618 data points).The bottom graph is where random years were used for validation (2009, 2011, 2014, and 2017) with the remainder as training (574 data points).
0.85 average site/crop-specific model r 2 : 0.82), likely owing to the greater pool of training data for larger models.Croplands cover about 12% of the Earth's ice-free land area (IPCC, 2019).Sustainable management on these lands can maintain or improve productivity while contributing to climate change mitigation and adaptation goals (Tang et al., 2015;IPCC, 2019;Browning et al., 2021).As a component of the carbon cycle, cropland GPP is an important indicator of productivity and sustainability, and its monitoring can contribute to furthering sustainability goals (Beer et al., 2010;Gilabert et al., 2015;Huang et al., 2018;Browning et al., 2021).GPP monitoring can also be valuable in understanding the short-term effects of extreme weather events on carbon dynamics, including droughts, floods, and intense storms (Ciais et al., 2005;Menefee et al., 2020;Yin et al., 2020).Given that few cropland sites have in-situ GPP monitoring, regional and global GPP estimates rely on remote sensing and modeling to estimate large-scale carbon uptake (Kalfas et al., 2011;Huang et al., 2018;Smith et al., 2019).This method of incorporating machine learning allows for a more flexible model that can apply to broad areas and pick up on trends not seen in process-based modeling (Beer et al., 2010;Jung et al., 2011;Xiao et al., 2014).Expanding this methodology to broader regions and more sites to create LTARwide carbon flux estimates is a future goal of this project.
Machine learning methods have already been widely used to successfully estimate global GPP on annual timesteps with various algorithm types (Beer et al., 2010;Jung et al., 2011;Xiao et al., 2014).Machine learning estimated global evapotranspiration, CH 4 emissions, and NEE on global, field, and regional scales (Yao et al., 2017;Knox et al., 2021;Shang et al., 2021;Talib et al., 2021).Applying these methods to agricultural lands can quantify carbon cycle contributions from agriculture and determine best management practices for carbon sequestration.Practically any field of suitable size can leverage the power of machine learning by utilizing methods as described in this paper, which, given the wide accessibility of input data, should make this type of analysis feasible for any large-scale cropping system.The methods employed provide a simple solution that can be followed with minimal experience/knowledge with machine learning.However, further improvement of a given algorithm's output via adjusting hyperparameters is limited to the range of values disclosed by the h2o package.Currently, the full range required for a sensitivity analysis is undisclosed by the h2o package.Data limitations from EC towers often arise due to the high costs of the equipment; the limited data can possibly result in commonly known issues of overfitting with machine learning.Nevertheless, the extensive reach and robustness of machine learning-based carbon models, as demonstrated here, make it an ideal method for future work in understanding cropland carbon uptake and climate interactions.Future steps aim to apply this methodology to more sites within the LTAR network.The LTAR network has contributed to our collective understanding of cropland biogeochemical cycling across the United States, and it is our hypothesis that the addition of machine learning methods will enhance these network analyses.

Conclusion
In this study, we showcased the applicability of machine learning to estimate GPP across LTAR croplands using MODIS satellite imagery, weather, and agroecoregion as input data.The MODIS GPP product, while correlated with in-situ GPP, frequently underestimated GPP during peak growing season and overestimated during seedling establishment and senescence.In simulating GPP at individual tower sites, model performance was best at sites with larger quantities of data available for model training.The machine learning methods also work well with all sites combined into one dataset, particularly for maize.Combined datasets provide more training data for the machine learning algorithm to work with and can thus improve model success over individual site-scale models.The success of machine learning at modeling GPP across three LTAR sites is a first step towards applying this methodology to the network as a whole.

FIGURE 1
FIGURE 1 Spatial locations of the three agroecoregions used in this study, all within the Long-Term Agroecosystem Research (LTAR) network.Created using the LTAR network shapefile, published under CC0-1.0.

TABLE 1
(Bean et al., 2021) 13 LTAR eddy covariance (EC) flux measurements sites.AmeriFlux site ID is provided in parentheses.apartnershipbetween USDA-ARS and Michigan State University(Bean et al., 2021).The average annual temperature is 9.9 °C and the average annual precipitation is 1,027 mm.Two EC towers were established in 2009 and remain in operation.Since 2009 had a different crop system compared to the rest of the years, 2009 data was not used in this study.The EC towers are in two different fields; one has been cropland since 1938 and the other was converted from Conservation Reserve Program (CRP) perennial grassland to cropland in 2009.Both sites are managed as no-till with continuous rainfed maize through

TABLE 2
The average results for machine learning modeling of individual agroecosystem GPP are shown below.Average is across the three temporal validation data splits.Units are gC m -2 day -1 .

TABLE 3
The average results for combined agroecosystem/crop GPP modeling are shown below.Average is across the three temporal validation data splits.Units are gC m -2 day -1 .