Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Plant Sci.

Sec. Sustainable and Intelligent Phytoprotection

This article is part of the Research TopicInnovative Approaches in Remote Sensing for Precise Crop Yield Estimation: Advancements, Applications, and Future DirectionsView all 9 articles

Multimodal Cross-Attention Network for Overgrowth Detection in Strawberry Seedlings

Provisionally accepted
Zhenzhen  ChengZhenzhen Cheng1Yifan  ChengYifan Cheng2Tingting  FangTingting Fang1Man  ZhuMan Zhu1Jing  LiuJing Liu1Peng  QiPeng Qi3*Qiaoyu  ZhangQiaoyu Zhang1*
  • 1College of Horticulture, Xinyang Agriculture and Forestry University, Xinyang, China
  • 2Huazhong University of Science and Technology, Wuhan, China
  • 3Shandong Academy of Agricultural machinery Sciences, jinan, China

The final, formatted version of the article will be published soon.

Early warning of overgrowth in strawberry seedlings is essential to balance vegetative and reproductive growth. However, existing monitoring methods face major challenges, including subtle visual symptoms and limited abnormal samples. To address this, we propose MM-CAPNet, a multimodal fusion framework for early detection of seedling overgrowth. We first developed a representative sample collection of strawberry seedlings through a systematic induction experiment, integrating historical environmental time-series data with contemporaneous plant images. The MM-CAPNet architecture uses a dual-stream design to process these inputs, with a Transformer encoder for environmental sequences and a MobileNetV2 encoder for images. A critical component of the proposed framework lies in the image-guided Cross-Attention mechanism, which uniquely treats the current phenotype as an active query to adaptively retrieve and aggregate the most diagnostically relevant segments of past environmental data. Experiments show MM-CAPNet outperforms baselines, reaching 87.6% accuracy and 0.901 AUC, with strong discriminative ability for early overgrowth categories. Ablation studies confirm its interpretability by linking visual phenotypes to key environmental drivers. This work provides growers with a proof-of-concept framework to regulate fertilization, irrigation, and light management during the nursery stage, thereby reducing the risk of excessive vegetative growth. The proposed framework supports precision cultivation strategies that enhance resource efficiency and crop resilience.

Keywords: strawberry, Overgrowth, multimodal fusion, cross-attention, early warning

Received: 16 Sep 2025; Accepted: 10 Nov 2025.

Copyright: © 2025 Cheng, Cheng, Fang, Zhu, Liu, Qi and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Peng Qi, qipeng@saas.ac.cn
Qiaoyu Zhang, 2021190008@xyafu.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.