MINI REVIEW article

Front. Plant Sci.

Sec. Functional and Applied Plant Genomics

Volume 16 - 2025 | doi: 10.3389/fpls.2025.1611992

This article is part of the Research TopicMachine Learning for Mining Plant Functional GenesView all 3 articles

Foundation models in plant molecular biology: Advances, challenges, and future directions

Provisionally accepted
Feng  XuFeng XuTianhao  WuTianhao WuQian  ChengQian ChengXiangfeng  WangXiangfeng WangJun  YanJun Yan*
  • China Agricultural University, Beijing, China

The final, formatted version of the article will be published soon.

A foundation model (FM) is a neural network trained on large-scale data using unsupervised or self-supervised learning, capable of adapting to a wide range of downstream tasks. This review provides a comprehensive overview of FMs in plant molecular biology, emphasizing recent advances and future directions. It begins by tracing the evolution of biological FMs across the DNA, RNA, protein, and single-cell levels, from tools inspired by natural language processing (NLP) to transformative models for decoding complex biological sequences. The review then focuses on plantspecific FMs such as GPN, AgroNT, PDLLMs, PlantCaduceus, and PlantRNA-FM, which address challenges that are widespread among plant genomes, including polyploidy, high repetitive sequence content, and environment-responsive regulatory elements, alongside universal FMs like GENERator and Evo 2, which leverage extensive cross-species training data for sequence design and prediction of mutation effects. Key opportunities and challenges in plant molecular biology FM development are further outlined, such as data heterogeneity, biologically informed architectures, cross-species generalization, and computational efficiency. Future research should prioritize improvements in model generalization, multi-modal data integration, and computational optimization to overcome existing limitations and unlock the potential of FMs in plant science. This review serves as an essential resource for plant molecular biologists and offers a clear snapshot of the current state and future potential of FMs in the field.

Keywords: Foundation model, plant, Molecular Biology, Large Language Model, transformer

Received: 15 Apr 2025; Accepted: 16 May 2025.

Copyright: © 2025 Xu, Wu, Cheng, Wang and Yan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Jun Yan, China Agricultural University, Beijing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.