ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Functional and Applied Plant Genomics

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1629794

This article is part of the Research TopicMachine Learning for Mining Plant Functional GenesView all 6 articles

SaGP: identifying plant saline-alkali tolerance genes based on machine learning techniques

Provisionally accepted
Baixue  QiaoBaixue Qiao1,2,3Wentao  GaoWentao Gao4Xudong  ZhangXudong Zhang1,3Min  DuMin Du1,3Xuanrui  LiuXuanrui Liu1,3Shaozi  PangShaozi Pang1,3Chunxue  YangChunxue Yang5Jiang  WangJiang Wang1*Yuming  ZhaoYuming Zhao4*Linan  XieLinan Xie2,6*
  • 1Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education, Northeast Forestry University, Harbin, China
  • 2School of Ecology, Northeast Forestry University, Harbin, China
  • 3State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
  • 4College of Computer and Control Engineering, Northeast Forestry University, Harbin, China
  • 5College of Landscape Architecture, Northeast Forestry University, Harbin, China
  • 6Key Laboratory of Sustainable Forest Ecosystem Management-Ministry of Education, School of Ecology, Northeast Forestry University, Harbin, China

The final, formatted version of the article will be published soon.

Mining novel genes underlying agronomical traits is a crucial subject in plant biology, essential for enhancing crop quality, ensuring food security, and preserving biodiversity. Wet experiments are the main methods to uncover genes with target functions but are expensive and time-consuming. Machine learning, in contrast, can accelerate the gene discovery process by learning from accumulated data, making it more efficient and cost-effective. However, despite their potential, existing machinelearning tools to mine stress-resistant genes in plants are scarce. In this study, we developed the first known machine learning model, SaGP (Saline-alkali Genes Prediction), to identify plant saline-alkali tolerance genes based on sequencing data. It outperformed traditional computational tools, i.e., BLAST, and correctly identified the latest published genes. Moreover, we utilized SaGP to evaluate three recently published genes: GhAG2, MdBPR6, and TaCCD1. SaGP correctly identified all their functions.Overall, these results suggest that SaGP can be used for the large-scale identification of saline-alkali tolerance genes and served as a framework for the development of additional automated tools, thus promoting crop breeding and plant conservation. To efficiently identify salt-alkali resistant genes in large-scale data, we developed a userfriendly, freely accessible web service platform based on SaGP (https://www.sagprediction.com/).

Keywords: machine learning, saline-alkali tolerance genes, Gene mining, Feature Selection, SAGP

Received: 16 May 2025; Accepted: 26 Jun 2025.

Copyright: © 2025 Qiao, Gao, Zhang, Du, Liu, Pang, Yang, Wang, Zhao and Xie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Jiang Wang, Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education, Northeast Forestry University, Harbin, China
Yuming Zhao, College of Computer and Control Engineering, Northeast Forestry University, Harbin, China
Linan Xie, School of Ecology, Northeast Forestry University, Harbin, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.