ORIGINAL RESEARCH article

Front. Nutr.

Sec. Nutrition and Food Science Technology

Volume 12 - 2025 | doi: 10.3389/fnut.2025.1617491

Near-Infrared Spectroscopy prediction of dry matter and starch content in cassava using optimized calibration models

Provisionally accepted
Paulo  Henrique Ramos GuimarãesPaulo Henrique Ramos GuimarãesMassaine  Bandeira SousaMassaine Bandeira SousaMarcos  de Souza CamposMarcos de Souza CamposCinara  Fernanda Garcia MoralesCinara Fernanda Garcia MoralesEder  Jorge OliveiraEder Jorge Oliveira*
  • Embrapa Mandioca e Fruticultura, Cruz das Almas, Bahia, Brazil

The final, formatted version of the article will be published soon.

Dry matter content (DMC) and starch content (StC) in cassava roots are critical quality traits for breeding programs. However, traditional phenotyping methods are time-consuming, labor-intensive, and limit the scale of evaluation. Near-infrared (NIR) spectroscopy has emerged as a promising alternative for rapid, non-destructive phenotyping. This study aimed to develop calibration models for predicting DMC and StC by comparing the prediction accuracy of two NIR devices -a benchtop spectrometer [Büchi NIRFlex N-500 (NIR.N500); 1000-2500 nm] and a portable device [QualitySpec Trek (NIR.QST); 350-2500 nm]. The study also evaluated the impact of sample type (fresh vs. processed) on model performance. A total of 3,391 cassava clones from the Embrapa Mandioca e Fruticultura breeding program were analyzed between 2018 and 2023. DMC was estimated using two reference methods: gravimetric analysis (DMCg) and oven drying to a constant weight (DMCo). StC was measured via manual extraction. Spectral data were split into training (80%) and validation (20%) sets, and three machine learning algorithms were tested: Partial Least Squares (PLS), k-Nearest Neighbors (KNN), and eXtreme Gradient Boosting (XGB). Results showed that KNN slightly outperformed PLS only for DMCg when using the NIR.N500 device. XGB demonstrated predictive performance comparable to PLS in specific cases-for example, for StC using the NIR.N500, where prediction accuracies were 0.89 for PLS and 0.88 for XGB, and for DMCo using the NIR.QST in processed samples, with accuracies of 0.95 for PLS and 0.92 for XGB. The NIR.N500 provided the highest predictive accuracy across all traits. However, the NIR.QST also performed well with processed samples, highlighting its potential as a practical, portable solution for field-based phenotyping. External validation confirmed these trends: PLS models consistently offered the best predictive accuracy, and processed samples led to significantly improved model performance. For DMCg and StC, the NIR.QST slightly outperformed the NIR.N500 (0.74 and 0.76, respectively), while for DMCo, the NIR.N500 achieved the highest accuracy (0.95), closely followed by the NIR.QST (0.93). Overall, using processed samples (mashed) substantially enhanced model performance, and the NIR.QST emerged as a reliable and efficient tool for cassava breeding programs.

Keywords: spectral data analysis, Crop modeling, high-throughput phenotyping, root quality traits, selection

Received: 24 Apr 2025; Accepted: 16 Jun 2025.

Copyright: © 2025 Guimarães, Sousa, Campos, Morales and Oliveira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Eder Jorge Oliveira, Embrapa Mandioca e Fruticultura, Cruz das Almas, 44380-000, Bahia, Brazil

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.