Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Appl. Math. Stat.

Sec. Statistics and Probability

Decoding the Future of Agricultural Participation: Machine Learning Insights to Unravel the Plausible Triggers

Provisionally accepted
  • University of KwaZulu-Natal, Durban, South Africa

The final, formatted version of the article will be published soon.

As agricultural participation continues to shift under the pressures of urbanisation, climate change, and evolving socioeconomic conditions, understanding the drivers behind household engagement becomes increasingly vital. This study explores these dynamics using household budget survey data, applying decision trees, random forests, and gradient boosting to uncover trends in model performance and variable importance over time. Our comparative analysis reveals a consistent decline in decision tree accuracy, which reflects the model's limited ability to capture increasingly complex and non-linear relationships in household behaviour. In contrast, ensemble learners — random forests and gradient boosting — combine multiple weak learners, typically shallow decision trees, to improve predictive performance. Random forests aggregate predictions through bagging, while gradient boosting builds trees sequentially to correct prior errors. These methods demonstrated superior sensitivity and balanced accuracy in identifying agricultural participants, particularly by 2017-2018, when random forests achieved a notably low out-of-bag error rate for classifying agricultural sales participants. However, early-year specificity remained a challenge. Key predictors evolved from income-dominated variables in 2002-2003 to a more nuanced mix of household size, age, water access, and geographic context by 2017-2018. While all models identified overlapping predictors, ensemble methods were more effective in capturing subtle interactions and demographic shifts. Decision trees, though less accurate overall, provided valuable insights into spatial variation, especially in 2010-2011 when district-level factors were prominent. Rural households consistently showed higher participation rates, with urbanisation and regional disparities becoming increasingly influential. These findings highlight the strength of ensemble learning in capturing the complexity of agricultural engagement and underscore the need for adaptive, data-driven policy strategies. The observed shifts in variable importance reflect a changing socioeconomic landscape, calling for targeted interventions that address local realities and emerging challenges such as climate volatility and rural-to-urban migration.

Keywords: agricultural participation, Classification, Decision Trees, ensemble, Feature Selection, machine learning, random forest

Received: 27 Aug 2025; Accepted: 27 Nov 2025.

Copyright: © 2025 Ramalebo, Chifurira, Zewotir and Chinhamu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Katiso Ramalebo

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.