AUTHOR=Angel Yoseline , McCabe Matthew F. 

TITLE=Machine Learning Strategies for the Retrieval of Leaf-Chlorophyll Dynamics: Model Choice, Sequential Versus Retraining Learning, and Hyperspectral Predictors

JOURNAL=Frontiers in Plant Science

VOLUME=Volume 13 - 2022

YEAR=2022

URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.722442

DOI=10.3389/fpls.2022.722442

ISSN=1664-462X

ABSTRACT=Monitoring leaf Chlorophyll (Chl) in-situ is labor-intensive, limiting representative sampling for detailed mapping of Chl variability at field scales across time. Unmanned aerial vehicles (UAV) and hyperspectral cameras provide flexible platforms for observing agricultural systems, overcoming this spatio-temporal sampling constraint. However, effectively translating point-to-field scale relationships is still required. Here, we evaluate a customized machine learning (ML) workflow to retrieving multi-temporal leaf-Chl levels, combining sub-centimeter resolution UAV-hyperspectral imagery (400-1000 nm) with leaf-level reflectance spectra and SPAD measurements. The study is performed within a phenotyping experiment to monitor wild tomato plants' development. While ML allows for exploring connections between ground-truth and spectral metrics, there is still much unknown about capturing temporal correlations, selecting relevant predictors, and retrieving accurate results under different conditions. Several experiments were undertaken to evaluate multiple ML strategies, including: 1) exploring sequential vs. retraining learning; 2) comparing insights gained from using 272 spectral bands vs. 60 pigment-based vegetation indices (VIs); and 3) assessing six regression methods (linear, partial-least-square regression; PLSR, decision trees, support vector, ensemble trees, and Gaussian process; GPR). Goodness-of-fit (R2) and accuracy metrics (MAE, RMSE) were determined using training/testing and validation data subsets to assess the models' performance. Comparative analysis between retrievals and validation data distributions informed the models' ability to capture Chl dynamics through SPAD levels. Overall, while equally good performance was obtained using either PLSR, GPR, or random forest, results show: (a) the retraining strategy improved the ability to model SPAD-based Chl dynamics; (b) VI predictors slightly improved R2 (e.g., from 0.59 to 0.74 units for GPR) and accuracy (e.g., MAE and RMSE differences of up to 2 SPAD units); (c) feature importance examined through these methods, revealed strong overlaps between relevant bands and VI predictors. The best-performing models were used to produce multi-temporal SPAD-based chlorophyll maps at a pixel resolution of 7 mm.