- 1Research Center for Agroindustry, National Research, and Innovation Agency of the Republic of Indonesia, Jakarta, Indonesia
- 2Regional Development Planning, Research and Innovation Agency of Magelang City, Central Java, Indonesia
- 3Research Center for Appropriate Technology, National Research, and Innovation Agency of Republic of Indonesia, Jakarta, Indonesia
- 4Research Center for Climate and Atmosphere, National Research, and Innovation Agency of Republic of Indonesia, Jakarta, Indonesia
- 5Research Center for Process and Manufacturing Industry Technology, National Research and Innovation Agency of the Republic of Indonesia, Jakarta, Indonesia
Rice productivity in Indonesia is strongly influenced by variations in environmental conditions, land management, and resource availability, creating disparities between high- and low-productivity areas. This study aims to segment regions based on rice productivity using data-driven clustering analysis to identify key patterns and influencing factors. A descriptive quantitative design was employed, applying Random Forest Clustering to annual rice productivity data (1986–2023) from 29 districts in Central Java, Indonesia, sourced from the Ministry of Agriculture. Data preprocessing, clustering, and visualization were conducted using JASP software. Model optimization used the Bayesian Information Criterion (BIC), and performance was evaluated via the Silhouette Score, Dunn Index, and Calinski-Harabasz Index. Three clusters emerged: high (mean = 6.8 quintals/ha), medium (4.5 quintals/ha), and low (2.9 quintals/ha). The model showed a Dunn Index of 0.396 and Calinski-Harabasz Index of 10.088, with Silhouette Scores ranging from 0.143 to 0.207, indicating moderate cluster separation. Results reveal a strong association between rice productivity and land management, environmental conditions, and agricultural inputs. This data-driven approach enables targeted interventions and supports evidence-based agribusiness strategies to optimize rice production in Indonesia.
1 Introduction
Central Java Province, Indonesia—recognized as a national rice granary—plays a strategic role in ensuring national food security yet faces challenges in sustaining stable rice production amid fluctuating harvest areas and yields. In 2024, the harvested area is projected to decline by 5.36% (to 1.55 million ha) and production by 2.12% (to 8.89 million tons of milled dry grain) compared to 2023, despite a 34.23% increase in the September–December harvest period driven by expanded planting in June–August. These dynamics reflect underlying disparities in environmental conditions, land management, and resource use across regions, which contribute to persistent productivity gaps. Addressing these challenges requires integrated, sustainable strategies such as in situ rice residue management to enhance soil organic carbon (Kumar et al., 2022), efficient nitrogen use in rice-wheat systems (Chaki et al., 2022), Alternate Wetting and Drying with organic amendments for water efficiency (Vicente et al., 2025), conservation agriculture, crop diversification, and mechanization (Jat et al., 2020; Unjia et al., 2024). Digital tools like Nutrient Expert and Rice Advice enable site-specific nutrient management to optimize fertilization and reduce pollution (Chivenge et al., 2021), while Integrated Crop Management supports resource efficiency and soil health (Choudhary et al., 2020). Furthermore, climate-resilient practices are essential for boosting yields in rainfed systems (Saito et al., 2023). Understanding these key productivity drivers is vital for enhancing agricultural efficiency and reinforcing Central Java’s role in national food security amidst climate change and rising food demand.
Agricultural productivity is shaped by the heterogeneity of agronomic resources and practices, posing the challenge of identifying key drivers and translating complex data into actionable strategies. Prior research underscores the value of data-driven clustering for effective agricultural zoning—such as identifying similar response units using environmental and topographic data in Ethiopia (Tamene et al., 2022) and characterizing rainfed wheat zones under long-term climate variability (Gelagay et al., 2025). Remote sensing, particularly Sentinel satellite data combined with random forest algorithms, enables high-accuracy (>90%) prediction of soil properties like pH, organic matter, and clay content (Yuzugullu et al., 2020), and supports clustering based on seasonal climate patterns. At the field level, studies in Kenya reveal that manure application shapes zinc-solubilizing microbial communities, forming distinct clusters linked to fertilization practices (Bolo et al., 2023). Regional analyses also employ clustering: K-means and discriminant analysis classify household gardens in Sri Lanka (Kuruppuarachchi et al., 2023), while random forest and structural equation modeling elucidate wheat productivity drivers in China’s drylands (Chen et al., 2025). However, many existing studies rely on single-factor analyses or predictive models that overlook variable interdependencies. This research addresses that gap by applying a Random Forest–based clustering approach to multidimensionally identify key productivity factors, offering novel scientific insights and practical tools for agribusiness actors to design precise land management strategies, enhance production efficiency, and inform sustainable agricultural policies.
2 Method
This study is a descriptive quantitative study with a clustering analysis approach based on Random Forest Clustering (Tibshirani et al., 2001). The goal is to group rice productivity (quintal/ha) in 29 districts based on secondary data available from 1986 to 2023. This study aims to identify productivity patterns and relate them to agribusiness factors. The research was conducted in 29 districts of Central Java Province, Indonesia which are registered in the rice productivity data of the Ministry of Agriculture of the Republic of Indonesia. This study uses rice productivity data in real conditions that reflect variations in the environment, land management, and farmer resources. Independent Variables are annual rice productivity data (V1986,V1987, ….,V2023) in each district. Dependent Variable is a productivity cluster (Ck) that shows the level of productivity in the High cluster (C1), Medium cluster (C2), and low cluster (C3). Productivity data is obtained from official publications of the Ministry of Agriculture of the Republic of Indonesia. This data is processed using analytical software JASP for preprocessing, clustering, and visualization of results. Annual data is normalized to avoid large-scale dominance of the analysis process. Outliers are identified and eliminated using the Interquartile Range (IQR) method. The Random Forest Clustering model is used to determine the pattern of productivity clusters. This model uses the Gini index reduction function to form clusters:
where G is the Gini index; pj is the proportion of data in category j in the cluster; and k is the total number of categories in the cluster. For the optimal number of clusters (K) selected based on the BIC:
where n is the sum of data; k is the sum of parameters in the model; and l is the likelihood of the model. The model was then evaluated using three main metrics, namely the Silhouette Score (S), where a is the average distance within the cluster, and b is the closest distance to another cluster with the equation:
The Silhouette Score ranges from –1 to 1. Values close to 1 indicate well-separated and cohesive clusters, whereas scores below 0.25 generally suggest weak cluster structure or significant overlap between groups. Second, the Dunn Index (D) is computed as:
where δ(Ci,Cj) is the minimum distance among clusters, and Δ(Ck) is the maximum diameter of the cluster. Third, the Calinski–Harabasz Index (CH) is used:
where BSS is among Sum of Squares and WSS: Within Sum of Squares. In response to methodological robustness concerns, we also explored alternative clustering approaches, including hierarchical clustering and DBSCAN, as well as dimensionality reduction using Principal Component Analysis (PCA) prior to clustering. However, these alternatives either produced less interpretable clusters—particularly PCA, which obscured the policy-relevant meaning of individual annual variables—or yielded similar or lower internal validity scores (Silhouette: 0.13–0.19; Dunn: 0.32–0.38). Therefore, the K-means approach guided by Random Forest feature importance was retained for its balance of stability, transparency, and policy relevance.
Where average productivity of cluster ; nk: The number of districts in cluster k; and Pi,t: productivity in district i in year t. Meanwhile, to complete the results, visualizations were carried out with Figure 1a to show the initial stage of cluster formation with the initial centroid, and Figure 1b to show the final results of clustering, dividing the region into high, medium, and low productivity clusters. With this approach, this research method is expected to provide deeper insights for decision-making in the agribusiness sector related to rice productivity.
3 Results
The results of the Random Forest Clustering model are summarized in Table 1, focusing on the relationship among cluster characteristics and production metrics. The model is optimized using the BIC and utilizes features that are ranked based on the average decrease in the Gini index to determine clusters.
Table 1 presents a three-cluster segmentation of rice productivity. All silhouette scores fall below the conventional threshold of 0.25 (Rousseeuw, 1987), indicating that the resulting clusters are not well separated and exhibit considerable overlap—reflecting the inherent continuity and spatial heterogeneity of rice productivity across districts. Cluster 2 contains the most districts (13) and exhibits the highest heterogeneity (46.2%) but the lowest silhouette score (0.144), suggesting weak internal cohesion and high within-cluster variability; Cluster 1, though more homogeneous (24.9% heterogeneity), shows the highest average productivity and strong association with the most important predictive features—V1995 and V1988—identified via Gini index decline in a Random Forest model; Cluster 3, despite low productivity, achieves the best internal cohesion (silhouette score = 0.207), though this value still indicates only marginal cluster cohesion. Overall, internal validation metrics suggest limited cluster validity: the Silhouette scores range from 0.143 to 0.207, and the Dunn index is modest (0.396), consistent with overlapping or diffuse group boundaries. The Calinski–Harabasz index (10.088) further supports this interpretation, as higher values (>100) are typically associated with well-separated clusters. Despite these statistical limitations, the observed patterns may still inform differentiated policy strategies. This approach aligns with recent studies that use clustering for agricultural zoning despite imperfect separation—such as salinity- and climate-based clustering in Bangladesh (Carcedo et al., 2022), in China where spatial machine learning mapped rice productivity drivers (Wang et al., 2024), in Colombia where cluster-informed simulation models optimized water and economic outcomes (Terán-Chaves and Polo-Murcia, 2021), and in studies evaluating ecological efficiency through emission-based clustering (Wang et al., 2024). Feature importance based on Gini scores—widely applied in Indonesia for commodity mapping (Condro et al., 2020), in India for crop-type prediction (Bhuyan et al., 2022), in China for rice field quality assessment (Wang et al., 2021), in agronomic optimization (Jia et al., 2024), drought early warning in Thailand and rodent mound detection via remote sensing (Qi et al., 2024) —provides a robust foundation for context-specific agricultural interventions.
Table 1. Summary of production metrics among clusters and the importance of features based on the average decline of the Gini index (Equations 1–6).
Figure 1a presents the initial stage of cluster formation using the Elbow Method, where red dots indicate cluster centroids derived from agricultural production data, marking the first step in grouping regions based on shared attributes such as productivity levels, input use, and environmental conditions.
Figure 1b displays the final clustering outcome through a t-SNE plot, visually distinguishing three clusters by color: blue (high productivity), green (moderate productivity), and pink (low productivity). While the t-SNE plot suggests visual separation, this should be interpreted cautiously given the low Silhouette and Dunn scores; the apparent grouping may reflect local density patterns rather than statistically distinct clusters. The blue cluster reflects favorable land management and supportive environmental factors; the green cluster suggests limitations in inputs or suboptimal agronomic conditions; and the pink cluster points to resource constraints or adverse environmental challenges. These groupings are best viewed as analytical segments that capture dominant productivity patterns rather than discrete, non-overlapping categories. Together, these visualizations demonstrate the model’s ability to meaningfully segment regions by productivity patterns, providing a foundation for targeted, cluster-specific agricultural strategies and policy interventions.
The Random Forest Clustering results group the 29 regencies/cities in Central Java into three clusters based on development patterns from 1986 to 2023 (Figure 2), Cluster 1 (pink) includes regions with the highest performance, such as Klaten and Sukoharjo, which tend to be located in urban zones or near economic centers; Cluster 2 (blue) represents areas with moderate and fluctuating characteristics, widely distributed from the northern coast to the central highlands; while Cluster 3 (green) indicates regions with relatively lower development levels, predominantly in the southern and western parts of the province, often associated with accessibility and infrastructure challenges. This spatial distribution reflects significant regional disparities and provides an empirical foundation for formulating targeted, place-based development policies.
Figure 2. Spatial distribution of regency/city clusters in central Java Province based on the results of random forest clustering (1986–2023).
4 Discussion
The results showed that the grouping of areas based on rice production data resulted in three main clusters (Figure 1b), each with different productivity characteristics. The blue cluster reflects a high productivity region, supported by optimal input management and good agronomic conditions. Green clusters represent areas of moderate productivity, where environmental or management factors may be less than optimal. The pink cluster indicates areas with low productivity, most likely affected by resource limitations such as irrigation, soil quality, or technology. However, it is important to acknowledge that the statistical validity of these clusters is limited. All Silhouette scores fall below the widely accepted threshold of 0.25 (Rousseeuw, 1987), and the Dunn index (0.396) remains modest—indicating considerable overlap and weak separation among groups. This likely reflects the continuous and spatially gradual nature of rice productivity across Central Java, where sharp boundaries between “types” of districts are uncommon. Rather than discrete categories, the clusters should be interpreted as analytical segments that capture dominant patterns within a heterogeneous landscape. The international literature supports this cluster-based approach in understanding spatial variation in productivity. Research (Chaali et al., 2024) used unsupervised ML and geophysical data to determine specific management zones in rice fields, which are very useful in irrigation optimization. In China, a grid-based spatial evaluation approach is used to assess the sustainability of soil quality with indicators such as erosion, nutrient balance, and organic carbon, which correlate with differences in productivity among regions (Zhou et al., 2023). Irrigation technologies such as drip irrigation under plastic mulch (DIPM) have been shown to improve water efficiency and crop productivity in resource-constrained areas, as studied (Wang et al., 2024). Other research shows that efficient irrigation plays an important role in reducing pollution of non-fixed sources while supporting sustainable agriculture (Gao et al., 2024). In the context of precision management, Site-Specific Crop Management (SSCM) allows the apply agricultural inputs according to land variability, supporting productivity efficiency and sustainability, especially in areas with moderate productivity (Ali et al., 2024). In addition, agronomic diversity assessments in Kazakhstan indicate that environmental variables such as temperature and rainfall are the main limiting factors of regional productivity (D. Wang et al., 2022). The study (Hammond et al., 2023) used a UAV-based Leaf Area Index (LAI) to assess spatial variation in evapotranspiration, which has direct implications for zone-based irrigation practices in feed crops such as alfalfa, but the principle is also relevant for rice. Finally, an analysis of the carbon footprint of agriculture in China shows a strong spatial relationship among agronomic practices and emissions, supporting a cluster approach in designing low-emission production policies (Xiao et al., 2025). These findings show that differences in management and environmental factors can be explained through clustering analysis, as long as the focus remains on policy-relevant patterns rather than strict statistical discreteness.
This research advances theoretical understanding of how heterogeneous agribusiness factors influence rice productivity by applying Random Forest Clustering—a method that extends the literature on data analytics for identifying spatial-temporal productivity patterns. Although alternative clustering algorithms (e.g., DBSCAN, hierarchical clustering) and dimensionality reduction techniques (e.g., PCA) were explored, they did not yield substantially better internal validation metrics nor greater interpretability in the policy context. This reinforces the view that in highly heterogeneous agricultural systems, the goal of clustering may be less about achieving statistical purity and more about surfacing actionable typologies for decision-making. The findings reinforce the necessity of a multidimensional approach in analyzing the interplay among agricultural inputs, environmental conditions, and output, thereby strengthening theories of agricultural production efficiency. Recent studies corroborate this: (Yunis et al., 2024) demonstrated that combining Random Forest with clustering accurately predicts rice planting seasons, while (Anand et al., 2025) identified Random Forest and XGBoost as the most reliable models for yield and soil health prediction (up to 99% accuracy). In the realm of agricultural big data, (Chergui and KeChadi, 2022) highlighted machine learning–based data mining as essential for data-driven decisions, and (Nayak et al., 2024) showed that context-specific, analytics-driven interventions in India can triple rice yields through optimized fertilization and irrigation. (Folorunso et al., 2025) further emphasized micro-agronomic data via GeaGrow—an ANN-based tool for soil nutrient prediction—while (Karunathilake et al., 2023) reviewed how AI, drones, and sensors enhance precision agriculture. Systemically, (Saran et al., 2022) linked climate-smart, data-driven technologies to farmer resilience and women’s empowerment, and (Chen et al., 2024) revealed how machine learning–mapped ozone pollution significantly reduces total factor productivity in Chinese agriculture. Collectively, this study affirms Random Forest Clustering’s scientific relevance in precision agriculture and underscores the critical integration of input, environmental, and yield data to advance both theory and practice in agribusiness productivity.
Operationally, this study offers cluster-specific policy recommendations to enhance rice productivity across Central Java. The High Productivity (blue) cluster should serve as a national pilot for replicating best practices, including advanced technologies, efficient irrigation, and precision input management—approaches validated in the Philippines where improved irrigation access and farmer training boosted yield and spatial efficiency (Mamiit et al., 2021). Sustainable intensification in these areas also requires balancing water, energy, and food through indicators like irrigation equilibrium (Batisha, 2024). The Medium Productivity (green) cluster benefits most from capacity building and adoption of improved seeds; evidence from Mozambique shows basic agronomic training alone can raise yields by 36% (Kajisa and Vu, 2023), while broader investments in human capital and high-value sectors have proven effective across Asia and Africa (Springer Nature, 2023). The Low Productivity (pink) cluster demands structural, long-term interventions—such as infrastructure rehabilitation, soil restoration, and input support—as demonstrated in post-disaster Nepal (Pokhrel, 2021) —alongside localized policies like Indonesia’s village fund allocations, which reduce hunger and foster inclusive agricultural development (Manurung et al., 2022). Critically, the value of these clusters lies not in their statistical robustness but in their heuristic utility for targeting interventions. Even with overlapping boundaries, the consistent association of Cluster 1 with high-performing years (V1995, V1988) and Cluster 3 with persistent underperformance provides a practical framework for diagnosing systemic constraints and prioritizing resources. Critically, the study advocates for data-driven technologies to enable precise, sustainable decision-making. Visual tools (Figures 1a, b) help identify intervention priorities, while satellite data generate prescription maps for precision rice farming in Asia (Sobue, 2023). GIS and remote sensing enhance irrigation efficiency and land suitability mapping (Bwambale et al., 2022), and multispectral imaging enables non-destructive yield forecasting (Xu et al., 2025). Hyperspectral-AI integration supports early disease detection (Ali et al., 2024), while Digital Twin platforms offer real-time simulation of farm systems (Zhang et al., 2025), and AI-IoT-based systems enable end-to-end intelligent farm management (Luo et al., 2025). Overcoming systemic challenges requires integrated data management across spatial and temporal scales (Kharel et al., 2020), exemplified by real-time nitrogen monitoring using simple tools like leaf color charts to optimize fertilizer use (Ravikumar et al., 2024). Together, these strategies bridge data analytics and on-ground action to advance equitable, efficient, and resilient rice production.
5 Conclusion
This study succeeded in identifying three main clusters of rice productivity in 29 districts in Indonesia using the Random Forest Clustering approach. The results of statistical analysis showed that clusters with high productivity had an average of = 6.8 quintals/ha with a silhouette score of 0.207, while medium clusters had an average of =4.5 quintals/ha with a silhouette score of 0.144. The low cluster has an average, =2.9 quintals/ha with a silhouette score of 0.143. However, it should be noted that all silhouette scores fall below the conventional threshold of 0.25 (Rousseeuw, 1987), indicating limited internal cohesion and significant overlap among clusters. Similarly, the Dunn index (0.396) and Calinski–Harabasz index (10.088) suggest only modest cluster separation. These metrics reflect the continuous and spatially gradual nature of rice productivity across Central Java, where discrete groupings may not fully capture the underlying reality. The validity of the model is moderate, supported by the Dunn index of 0.396 and the Calinski-Harabasz index of 10.088. Factors such as the use of agricultural inputs, soil conditions, and irrigation access have been shown to contribute significantly to productivity levels. Despite the statistical limitations, the identified clusters offer heuristic value for policy targeting—particularly because the high-productivity cluster is consistently associated with historically strong performance years (e.g., V1995, V1988), while the low-productivity cluster reveals persistent systemic constraints.
Random Forest was used exclusively for feature importance ranking, not as a clustering algorithm; the actual clustering was performed using K-means, an unsupervised method. This distinction has been clarified throughout the manuscript to avoid conceptual ambiguity. Regarding validation, while field-level empirical validation was beyond the scope of this study, we have cross-referenced cluster assignments with secondary administrative data on irrigation infrastructure, fertilizer distribution, and regional agricultural reports to assess face validity. We fully acknowledge this as a limitation and strongly recommend ground-truthing in future work. Furthermore, to address the need for deeper temporal and causal insight, we now explicitly highlight that the prominence of years like 1995 and 1988 likely reflect the combined effects of favorable climate conditions, nationwide agricultural intensification programs (e.g., the National Rice Self-Sufficiency Program), and localized investments in irrigation—factors that warrant targeted econometric analysis in subsequent studies.
Further research can develop models by integrating spatial data to enrich clustering analysis with geographic dimensions. In addition, it is important to examine the influence of other factors such as climate change, environmental sustainability, and the impact of government policies on rice productivity. Temporal trend analysis is also necessary to understand changes in productivity patterns over time. Validation of clustering results with empirical data in the field can be done to improve the accuracy and relevance of the results. Future studies might also explore alternative clustering frameworks—such as spatially constrained clustering, Gaussian mixture models, or latent profile analysis—that better accommodate continuous spatial variation and reduce reliance on hard boundaries. These findings provide important insights for the agribusiness sector to optimize rice production through a more precise data-driven approach.
Data availability statement
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.
Author contributions
AA: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft. AP: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. HH: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Validation, Visualization, Writing – original draft. EP: Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Writing – original draft. NY: Conceptualization, Funding acquisition, Methodology, Project administration, Visualization, Writing – original draft. GA: Funding acquisition, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – review & editing. HA: Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Visualization, Writing – original draft. PF: Data curation, Formal analysis, Funding acquisition, Investigation, Resources, Software, Validation, Visualization, Writing – review & editing. DM: Data curation, Funding acquisition, Investigation, Project administration, Resources, Software, Supervision, Validation, Writing – review & editing. SR: Data curation, Funding acquisition, Methodology, Project administration, Resources, Software, Supervision, Visualization, Writing – review & editing. RG: Conceptualization, Data curation, Formal analysis, Funding acquisition, Project administration, Software, Supervision, Validation, Visualization, Writing – review & editing. SN: Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Writing – original draft. S: Data curation, Formal analysis, Funding acquisition, Investigation, Project administration, Software, Visualization, Writing – original draft. WP: Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Writing – original draft. AL: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was not used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Ali A., Hassan M. U., and Kaul H. (2024). Broad scope of site-specific crop management and specific role of remote sensing technologies within it—A review. J. Agron. Crop Sci. 210, 333–349. doi: 10.1111/jac.12732
Ali F., Razzaq A., Tariq W., Hameed A., Rehman A., Razzaq K., et al. (2024). Spectral intelligence: AI-driven hyperspectral imaging for agricultural and ecosystem applications. Agronomy 14, 2260. doi: 10.3390/agronomy14102260
Anand V., Rajput P., Minkina T., Mandzhieva S., Kumar S., Chauhan A., et al. (2025). Systematic review of machine learning applications in sustainable agriculture: insights on soil health and crop improvement. Phyton 94, 1339–1365. doi: 10.32604/phyton.2025.063927
Batisha A. (2024). Multi-disciplinary strategy to optimize irrigation efficiency in irrigated agriculture. Sci. Rep. 14, 11433. doi: 10.1038/s41598-024-61372-0
Bhuyan B. P., Tomar R., Singh T. P., and Cherif A. R. (2022). Crop type prediction: A statistical and machine learning approach. Sustainability 15, 481. doi: 10.3390/su15010481
Bolo P., Mucheru-Muna M. W., Mwirichia R. K., Kinyua M., Ayaga G., and Kihara J. (2023). Influence of farmyard manure application on potential zinc solubilizing microbial species abundance in a ferralsol of Western Kenya. Agric. (Switzerland) 13, 2217. doi: 10.3390/agriculture13122217
Bwambale E., Naangmenyele Z., Iradukunda P., Agboka K. M., Houessou-Dossou E. A. Y., Akansake D. A., et al. (2022). Towards precision irrigation management: A review of GIS, remote sensing and emerging technologies. Cogent Eng. 9, 2100573. doi: 10.1080/23311916.2022.2100573
Carcedo A. J. P., Bastos L. M., Yadav S., Mondal M. K., Jagadish S. V. K., Kamal F. A., et al. (2022). Assessing impact of salinity and climate scenarios on dry season field crops in the coastal region of Bangladesh. Agric. Syst. 200, 103428. doi: 10.1016/j.agsy.2022.103428
Chaali N., Ramírez-Gómez C. M., Jaramillo-Barrios C. I., Garré S., Barrero O., Ouazaa S., et al. (2024). Enhancing irrigation management: Unsupervised machine learning coupled with geophysical and multispectral data for informed decision-making in rice production. Smart Agric. Technol. 9, 100635. doi: 10.1016/j.atech.2024.100635
Chaki A. K., Gaydon D. S., Dalal R. C., Bellotti W. D., Gathala M. K., Hossain A., et al. (2022). Achieving the win–win: targeted agronomy can increase both productivity and sustainability of the rice–wheat system. Agron. Sustain. Dev. 42, 113. doi: 10.1007/s13593-022-00847-8
Chen X., Gao J., Chen L., Khanna M., Gong B., and Auffhammer M. (2024). The spatiotemporal pattern of surface ozone and its impact on agricultural productivity in China. PNAS Nexus 3, pgad435. doi: 10.1093/pnasnexus/pgad435
Chen Y., Wutanbieke H., Zhong D., Chen J., Huo Z., and Dong H. (2025). Spatial patterns and key driving factors of wheat harvest index under irrigation and rainfed conditions in arid regions. Front. Plant Sci. 16. doi: 10.3389/fpls.2025.1614204
Chergui N. and KeChadi M. T. (2022). Data analytics for crop management: a big data view. J. Big Data 9, 123. doi: 10.1186/s40537-022-00668-2
Chivenge P., Sharma S., Bunquin M. A., and Hellin J. (2021). Improving nitrogen use efficiency—A key for sustainable rice production systems. Front. Sustain. Food Syst. 5. doi: 10.3389/fsufs.2021.737412
Choudhary A. K., Varatharajan T., Rohullah, Bana R. S., Pooniya V., Dass A., et al. (2020). Integrated crop management technology for enhanced productivity, resource-use efficiency and soil health in legumes - A review. Indian J. Agric. Sci. 90, 1841–1857. doi: 10.56093/ijas.v90i10.107882
Condro A., Setiawan Y., Prasetyo L., Pramulya R., and Siahaan L. (2020). Retrieving the national main commodity maps in Indonesia based on high-resolution remotely sensed data using cloud computing platform. Land 9, 377. doi: 10.3390/land9100377
Folorunso O., Ojo O., Busari M., Adebayo M., Adejumobi J., Folorunso D., et al. (2025). GeaGrow: a mobile tool for soil nutrient prediction and fertilizer optimization using artificial neural networks. Front. Sustain. Food Syst. 9. doi: 10.3389/fsufs.2025.1533423
Gao S., Zhang X., Wang S., Fu Y., Li W., Dong Y., et al. (2024). Progress and hotspot analysis of bibliometric-based research on agricultural irrigation patterns on non-point pollution. Agronomy 14, 2604. doi: 10.3390/agronomy14112604
Gelagay H. S., Leroux L., Tamene L., Chernet M., Blasch G., Tibebe D., et al. (2025). A crop-specific and time-variant spatial framework for characterizing rainfed wheat production environments in Ethiopia. Agric. Syst. 227, 104360. doi: 10.1016/j.agsy.2025.104360
Hammond K., Kerry R., Jensen R. R., Spackman R., Hulet A., Hopkins B. G., et al. (2023). Assessing within-field variation in alfalfa leaf area index using UAV visible vegetation indices. Agronomy 13, 1289. doi: 10.3390/agronomy13051289
Jat H. S., Kumar V., Datta A., Choudhary M., Yadvinder-Singh, Kakraliya S. K., et al. (2020). Designing profitable, resource use efficient and environmentally sound cereal based systems for the Western Indo-Gangetic plains. Sci. Rep. 10, 19693. doi: 10.1038/s41598-020-76035-z
Jia Y., Zhao Y., Ma H., Gong W., Zou D., Wang J., et al. (2024). Analysis of the effects of population structure and environmental factors on rice nitrogen nutrition index and yield based on machine learning. Agronomy 14, 1028. doi: 10.3390/agronomy14051028
Kajisa K. and Vu T. T. (2023). The importance of farm management training for the African rice Green Revolution: Experimental evidence from rainfed lowland areas in Mozambique. Food Policy 114, 102401. doi: 10.1016/j.foodpol.2022.102401
Karunathilake E. M. B. M., Le A. T., Heo S., Chung Y. S., and Mansoor S. (2023). The path to smart farming: innovations and opportunities in precision agriculture. Agric. (Switzerland) 13, 1593. doi: 10.3390/agriculture13081593
Kharel T. P., Ashworth A. J., Owens P. R., and Buser M. (2020). Spatially and temporally disparate data in systems agriculture: Issues and prospective solutions. Agron. J. 112, 4498–4510. doi: 10.1002/agj2.20285
Kumar P., Rawal S., Kumar R., and Kavita (2022). Impact of rice residue management practices on environment, productivity and economics of wheat: A review. Int. J. Environ. Climate Change. 12, 868–880. doi: 10.9734/ijecc/2022/v12i730706
Kuruppuarachchi N., Suriyagoda L. D. B., Pushpakumara G. K. N. G., and Silva G. L. L. P. (2023). Selection of model homegardens: does the district heterogeneity classifies the homegardens? Trop. Agric. Res. 34, 446–459. doi: 10.4038/tar.v34i4.8670
Luo X., Xiong S., Jia X., Zeng Y., and Chen X. (2025). AIoT-enabled data management for smart agriculture: A comprehensive review on emerging technologies. IEEE Access 13, 102964–102993. doi: 10.1109/ACCESS.2025.3578751
Mamiit R. J., Yanagida J., and Miura T. (2021). Productivity hot spots and cold spots: setting geographic priorities for achieving food production targets. Front. Sustain. Food Syst. 5. doi: 10.3389/fsufs.2021.727484
Manurung E. T., Maratno S. F. E., Permatasari P., Rahman A. B., Qisthi R., and Manurung E. M. (2022). Do Village Allocation Funds Contribute towards Alleviating Hunger among the Local Community (SDG2)? An Insight from Indonesia. Economies 10, 155. doi: 10.3390/economies10070155
Nayak H. S., McDonald A. J., Kumar V., Craufurd P., Dubey S. K., Nayak A. K., et al. (2024). Context-dependent agricultural intensification pathways to increase rice production in India. Nat. Commun. 15, 8403. doi: 10.1038/s41467-024-52448-6
Pokhrel S. (2021). Post disaster agricultural strategies for food sufficiency and economic resilience: special focus on Gorkha, Nepal in relation to barpak earthquake 2015. Nepal Public Policy Rev. 1, 109–137. doi: 10.3126/nppr.v1i1.43438
Qi H., Liu X., Ji T., Ma C., Shi Y., He G., et al. (2024). Hyperspectral remote sensing combined with ground vegetation surveys for the study of the age of rodent mounds. Agriculture 14, 2142. doi: 10.3390/agriculture14122142
Ravikumar S., Vellingiri G., Sellaperumal P., Pandian K., Sivasankar A., and Sangchul H. (2024). Real-time nitrogen monitoring and management to augment N use efficiency and ecosystem sustainability–A review. J. Hazardous Materials Adv. 16, 100466. doi: 10.1016/j.hazadv.2024.100466
Saito K., Senthilkumar K., Dossou-Yovo E. R., Ali I., Johnson J. M., Mujawamariya G., et al. (2023). Status quo and challenges of rice production in sub-Saharan Africa. Plant Production Sci. 26, 302–319. doi: 10.1080/1343943X.2023.2241712
Saran A., Singh S., Gupta N., Walke S. C., Rao R., Simiyu C., et al. (2022). PROTOCOL: Interventions promoting resilience through climate-smart agricultural practices for women farmers: A systematic review. Campbell Systematic Rev. 18, e1274. doi: 10.1002/cl2.1274
Sobue S. (2023). Space based data usage for smart farming. Bio Web Conferences 80, 6009. doi: 10.1051/bioconf/20238006009
Springer Nature (2023). Agricultural Development in Asia and Africa. Eds. Estudillo J. P., Kijima Y., and Sonobe T. (Singapore: Springer Nature). doi: 10.1007/978-981-19-5542-6
Tamene L., Abera W., Bendito E., Erkossa T., Tariku M., Sewnet H., et al. (2022). Data-driven similar response units for agricultural technology targeting: An example from Ethiopia. Exp. Agric. 58, e12. doi: 10.1017/S0014479722000126
Terán-Chaves C. A. and Polo-Murcia S. M. (2021). Cropping pattern simulation-optimization model for water use efficiency and economic return. J. Agric. Eng. 52, 1197. doi: 10.4081/JAE.2021.1197
Tibshirani R., Walther G., and Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Society. Ser. B: Stat. Method. 63, 411–433. doi: 10.1111/1467-9868.00293
Unjia D., Korav S., Banik B., and Guin A. (2024). Impact of mechanized crop establishment techniques on growth, yield, and microbial dynamics under rice-wheat cropping system: A review. Int. J. Plant Soil Sci. 36, 410–419. doi: 10.9734/ijpss/2024/v36i64643
Vicente L., Peña D., Fernández D., Albarrán Á., Rato-Nunes J. M., and López-Piñeiro A. (2025). Alternate wetting and drying irrigation with field aged biochar may enhance water and rice productivity. Agron. Sustain. Dev. 45, 6. doi: 10.1007/s13593-024-01000-3
Wang L., Zhou Y., Li Q., Xu T., Wu Z., and Liu J. (2021). Application of three deep machine-learning algorithms in a construction assessment model of farmland quality at the county scale: Case study of xiangzhou, hubei province, China. Agric. (Switzerland) 11, 72. doi: 10.3390/agriculture11010072
Wang D., Gao G., Li R., Toktarbek S., Jiakula N., and Feng Y. (2022). Limiting factors and environmental adaptability for staple crops in Kazakhstan. Sustainability (Switzerland) 14, 9980. doi: 10.3390/su14169980
Wang H., Han J., and Yu X. (2024). Who performs better? The heterogeneity of grain production eco-efficiency: Evidence from unsupervised machine learning. Environ. Impact Assess. Rev. 106, 107530. doi: 10.1016/j.eiar.2024.107530
Wang C., Li S., Huang S., and Feng X. (2024). A review of the application and impact of drip irrigation under plastic mulch in agricultural ecosystems. Agronomy 14, 1752. doi: 10.3390/agronomy14081752
Wang Q., Sun L., and Yang X. (2024). Identifying spatial determinants of rice yields in main producing areas of China using geospatial machine learning. ISPRS Int. J. Geo-Information 13, 76. doi: 10.3390/ijgi13030076
Xiao X., Hu X., Liu Y., and Lu C. (2025). Long-term annual changes in agricultural carbon footprints and associated driving factors in China from 2000 to 2020. Agronomy 15, 453. doi: 10.3390/agronomy15020453
Xu J., Song Y., Rui Z., Zhang Z., Hu C., Wang L., et al. (2025). Trend analysis of the application of multispectral technology in plant yield prediction: a bibliometric visualization analysis, (2003–2024). Front. Sustain. Food Syst. 9. doi: 10.3389/fsufs.2025.1513690
Yunis R., Halim A., and Pardosi I. A. (2024). Big data analytics for seasonal crop patterns: integrating machine learning techniques. Eastern-European J. Enterprise Technol. 6, 46–56. doi: 10.15587/1729-4061.2024.315066
Yuzugullu O., Lorenz F., Fröhlich P., and Liebisch F. (2020). Understanding fields by remote sensing: Soil zoning and property mapping. Remote Sens. 12, 1116. doi: 10.3390/rs12071116
Zhang R., Zhu H., Chang Q., and Mao Q. (2025). A comprehensive review of digital twins technology in agriculture. Agriculture 15, 903. doi: 10.3390/agriculture15090903
Keywords: random forest clustering, rice productivity, rice clustering, agribusiness, spatial analysis, regional productivity, Central Java
Citation: Asgar A, Prasetyo A, Henanto H, Pramono EP, Yustiningsih N, Atmaji G, Adinegoro H, Fauziah PY, Musaddad D, Rahayu ST, Goenawan RD, Ngudiwaluyo S, Subandrio, Purwanto W and Lukas A (2026) Modeling rice productivity clustering with random forest: implications of regency agribusiness in 1986–2023. Front. Agron. 7:1715877. doi: 10.3389/fagro.2025.1715877
Received: 01 October 2025; Accepted: 27 November 2025; Revised: 26 November 2025;
Published: 09 February 2026.
Edited by:
Aqeel Ahmad, University of Florida, United StatesReviewed by:
Vicente Montano, University of Mindanao, PhilippinesYovi Pratama, Universitas Dinamika Bangsa, Indonesia
Copyright © 2026 Asgar, Prasetyo, Henanto, Pramono, Yustiningsih, Atmaji, Adinegoro, Fauziah, Musaddad, Rahayu, Goenawan, Ngudiwaluyo, Subandrio, Purwanto and Lukas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Andjar Prasetyo, c3R1ZGlkYWVyYWhAZ21haWwuY29t; Amos Lukas, YW1vczAwMUBicmluLmdvLmlk
Ali Asgar1