ORIGINAL RESEARCH article
Front. Nutr.
Sec. Nutrition and Food Science Technology
This article is part of the Research TopicMachine Learning Applications in Multi-Category Food Nutritional AssessmentView all articles
Machine Learning and Near-Infrared Fusion-Driven Quantitative Characterization and Detection of Protein Content in Maize Kernels
Provisionally accepted- 1Jiangsu University, Zhenjiang, China
- 2Nanjing Forestry University, Nanjing, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
This study aims to develop a rapid and non-destructive method for determining protein content in maize using Near-Infrared Spectroscopy (NIRS). To mitigate the effects of surface irregularities and uneven protein distribution in whole kernels on spectral measurements, maize powder was used as the test material to enhance the uniformity and stability of spectral signals. A total of 90 maize powder samples were collected from major production regions across China, and a custom NIRS acquisition system was constructed. To optimise the spectral data, eight preprocessing methods—including Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), First Derivative (1D), Savitzky–Golay smoothing (S–G), and their combinations—were systematically evaluated. Subsequently, traditional machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM) and deep learning models (ResNet-18, Transformer) were developed to predict protein content, and their performances were compared. Results indicated that the combined preprocessing strategy of First Derivative and Multiplicative Scatter Correction (1D+MSC) was the most effective. Among the models, PLSR demonstrated the best predictive performance, and traditional chemometric methods showed greater practical utility compared to deep learning models. To further enhance model efficiency, four feature wavelength selection methods—Partial Least Squares Regression Coefficients (PLSRC), Competitive Adaptive Reweighted Sampling (CARS), Successive Projections Algorithm (SPA), and Uninformative Variable Elimination (UVE)—were applied. It was found that the PLSR model combined with the Successive Projections Algorithm (SPA) yielded the optimal performance, achieving a validation set correlation coefficient (Rp) of 0.927, a root mean square error of prediction (RMSEP) of 0.301, and a residual predictive deviation (RPD) of 2.502, along with the fastest computational speed. This study provides a reliable technical solution and theoretical foundation for the rapid and non-destructive detection of protein content in maize, while also validating the advantage of using powdered samples in improving the accuracy of NIRS detection.
Keywords: near-infrared spectroscopy, Maize powder, protein content, machine learning, detection
Received: 06 Oct 2025; Accepted: 27 Nov 2025.
Copyright: © 2025 Yu, Qiao, Fan, Dong and Cao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Chenlong Fan
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
