ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Technical Advances in Plant Science

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1604088

Enhancing Buckwheat Maturity Classification with Generative Adversarial Networks for Spectroscopy Data Augmentation

Provisionally accepted
HuiHui  WangHuiHui WangXiaoxue  CheXiaoxue CheJiaxuan  NanJiaxuan NanYuyuan  MiaoYuyuan MiaoYaqi  WangYaqi WangWuping  ZhangWuping ZhangFuzhong  LiFuzhong Li*Jiwan  HanJiwan Han*
  • College of Software, Shanxi Agricultural University, Jinzhong, Shanxi Province, China

The final, formatted version of the article will be published soon.

The optimal harvest period for buckwheat is challenging to determine due to its short growth cycle. Harvesting too early or too late can negatively affect the quality of the crop. Traditional harvest methods are labor-intensive and fail to account for the spatial variability in buckwheat quality within a field. This study explores the use of near-infrared (NIR) spectral data to classify the maturity stages of buckwheat. Four distinct developmental stages were examined: UM (Unripe Maturity), representing buckwheat harvested at 65 days after sowing; HM (Half Maturity), harvested at 75 days; MS (Full Maturity with Shell), harvested at 85 days with husks intact; and MUS (Full Maturity Unhulled Sample), also harvested at 85 days but manually dehulled. Unlike traditional machine learning models, which require diverse and extensive datasets, this study investigates the use of a conditional WGAN-GP to generate synthetic datasets and improve model performance. Four machine learning models were employed in this study: Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbors (KNN), and Partial Least Squares Linear Discriminant Analysis (PLS-LDA). The conditional WGAN with the gradient penalty was trained for a range of epochs:1000, 2000, 8000, 10,000, and 20,000. After training 10,000 epochs, synthetic hyperspectral reflectance data were very similar to real spectra for each maturity category. To assess the impact of conditional WGAN-GP data augmentation, model performance was first evaluated using the original dataset as a baseline, showing PLS-LDA have the best classification performance with accuracy of 95% and kappa coefficient of 0.93. The models were then trained on a combination of original and synthetic data, revealing synthetic data can improve the classification model performance for RF and KNN. The best classification performance was achieved by RF with accuracy of 97% and kappa coefficient of 0.94. This study demonstrates the effectiveness of synthetic data in enhancing classification accuracy.

Keywords: buckwheat1, spectroscopy2, Machine Learning3, Generative Adversarial Networks4, NIR5, precision agriculture6

Received: 01 Apr 2025; Accepted: 23 Jun 2025.

Copyright: © 2025 Wang, Che, Nan, Miao, Wang, Zhang, Li and Han. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Fuzhong Li, College of Software, Shanxi Agricultural University, Jinzhong, 030801, Shanxi Province, China
Jiwan Han, College of Software, Shanxi Agricultural University, Jinzhong, 030801, Shanxi Province, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.