Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Sustain. Food Syst.

Sec. Crop Biology and Sustainability

Volume 9 - 2025 | doi: 10.3389/fsufs.2025.1612009

Utilizing machine learning and bioinformatics analysis to identify drought stress responsive genes in wheat (Triticum aestivum L.)

Provisionally accepted
Jiabei  HeJiabei HeBaoyue  CuiBaoyue CuiPingzeng  LiuPingzeng LiuXianyong  MengXianyong MengJun  YanJun Yan*
  • Shandong Agricultural University, Taian, China

The final, formatted version of the article will be published soon.

One of the main abiotic stressors affecting agricultural output is drought stress, which has a substantial impact on wheat growth, development, and yield. This study aims to uncover transcriptomic changes in wheat leaves under drought stress using machine learning and bioinformatics approaches, thereby providing new research perspectives and solutions for understanding the mechanisms of abiotic stress responses in wheat and identifying drought-tolerant genes. First, publicly available RNA sequencing data on wheat drought stress were retrieved from databases, followed by sequence alignment and quantitative expression analysis. Differentially expressed genes (DEGs) under drought stress were identified through differential expression analysis. Subsequently, a weighted gene co-expression network was constructed to determine key gene modules, and multiple machine learning models were compared for their performance. Finally, an improved Random Forest-Boruta (RF-Boruta) algorithm was employed to identify key genes closely associated with drought stress responses. The differential expression analysis identified 16,754 DEGs, and the constructed gene co-expression network successfully identified modules related to drought stress responses. Among the various machine learning models, the random forest algorithm performed best in identifying drought stress-responsive genes. The improved RF-Boruta algorithm further selected candidate genes highly related to drought stress, improving model accuracy from 0.889 to 0.942 and the area under the curve (AUC) from 0.968 to 0.978. Gene enrichment analysis was also conducted. By integrating bioinformatics and machine learning techniques, this study identified key genes highly associated with drought stress responses in wheat, providing important insights into the potential mechanisms of drought responses in wheat.

Keywords: Drought stress, machine learning, transcriptome analysis, co-expression network, wheat. Article type

Received: 15 Apr 2025; Accepted: 07 Jul 2025.

Copyright: © 2025 He, Cui, Liu, Meng and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jun Yan, Shandong Agricultural University, Taian, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.