ORIGINAL RESEARCH article
Front. Agron.
Sec. Agroecological Cropping Systems
Network-Enhanced Machine Learning Framework for Multi-Crop Yield Prediction: A Comprehensive Analysis of Indian Agricultural Data
Provisionally accepted- Vellore Institute of Technology (VIT), Vellore, India, Vellore, India
Select one of your emails
You have multiple emails registered with Frontiers:
Notify me on publication
Please enter your email address:
If you already have an account, please login
You don't have a Frontiers account ? You can register here
Accurate crop yield prediction is a cornerstone for food security, agricultural planning, and evidence-based policy design. In this work, we develop a network-enhanced machine learning framework that combines district similarity structures and crop co-occurrence patterns with rich temporal features to forecast yields for multiple crops across India. The empirical analysis relies on 52 years of district-level agricultural data (1966–2017) from 311 districts and focuses on six key crops: rice, wheat, maize, groundnut, cotton, and sugarcane. We construct two complementary network representations: a district similarity network derived from long-term yield trajectories (311 nodes, 2,996 edges, 6.2% density) and a crop co-occurrence network spanning 23 crops (253 edges). From these networks, we compute several centrality indicators and integrate them with temporal covariates, including lagged yields, rolling statistics, volatility measures, and diversification indices. We used a strict time-series cross-validation setup to compare simple baselines (Naive, Rolling Mean) with more advanced models (Ridge Regression, Random Forest, Gradient Boosting), both with and without network-based features. Among all evaluated models, Random Forest achieved the strongest performance for every crop, yielding R2 values above 0.94 (rice: 0.988, wheat: 0.976, maize: 0.971, groundnut: 0.946, cotton: 0.969, sugarcane: 0.986). Statistical tests showed that the advanced models significantly outperformed the baselines for five of the six crops (p < 0.05). However, network features contributed less than 1% to overall feature importance, indicating that temporal patterns are the main drivers of prediction. Together with temporal stability checks and residual diagnostics, this evaluation setup offers a solid framework for agricultural forecasting and for designing practical crop yield prediction and decision-support systems. This study is primarily positioned as a rigorous benchmarking and methodological validation framework rather than a performance breakthrough, providing empirical evidence on the relative value of different feature-engineering strategies and establishing best practices for time-series cross-validation in agricultural machine learning. The finding that static network features provide negligible incremental value beyond temporal covariates is itself a significant contribution, guiding practitioners toward investments in data quality rather than complex network constructions.
Keywords: Agricultural informatics, Crop yield prediction, District similarity networks, Feature importance, machine learning, Network analysis, random forest, Time-series forecasting
Received: 15 Dec 2025; Accepted: 05 Feb 2026.
Copyright: © 2026 C and A. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Parthiban A
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
