ORIGINAL RESEARCH article
Front. Sustain. Food Syst.
Sec. Crop Biology and Sustainability
Volume 9 - 2025 | doi: 10.3389/fsufs.2025.1612009
Utilizing machine learning and bioinformatics analysis to identify drought stress responsive genes in wheat (Triticum aestivum L.)
Provisionally accepted- Shandong Agricultural University, Taian, China
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
One of the main abiotic stressors affecting agricultural output is drought stress, which has a substantial impact on wheat growth, development, and yield. This study aims to uncover transcriptomic changes in wheat leaves under drought stress using machine learning and bioinformatics approaches, thereby providing new research perspectives and solutions for understanding the mechanisms of abiotic stress responses in wheat and identifying drought-tolerant genes. First, publicly available RNA sequencing data on wheat drought stress were retrieved from databases, followed by sequence alignment and quantitative expression analysis. Differentially expressed genes (DEGs) under drought stress were identified through differential expression analysis. Subsequently, a weighted gene co-expression network was constructed to determine key gene modules, and multiple machine learning models were compared for their performance. Finally, an improved Random Forest-Boruta (RF-Boruta) algorithm was employed to identify key genes closely associated with drought stress responses. The differential expression analysis identified 16,754 DEGs, and the constructed gene co-expression network successfully identified modules related to drought stress responses. Among the various machine learning models, the random forest algorithm performed best in identifying drought stress-responsive genes. The improved RF-Boruta algorithm further selected candidate genes highly related to drought stress, improving model accuracy from 0.889 to 0.942 and the area under the curve (AUC) from 0.968 to 0.978. Gene enrichment analysis was also conducted. By integrating bioinformatics and machine learning techniques, this study identified key genes highly associated with drought stress responses in wheat, providing important insights into the potential mechanisms of drought responses in wheat.
Keywords: Drought stress, machine learning, transcriptome analysis, co-expression network, wheat. Article type
Received: 15 Apr 2025; Accepted: 07 Jul 2025.
Copyright: © 2025 He, Cui, Liu, Meng and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Jun Yan, Shandong Agricultural University, Taian, China
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.