Using machine learning to evaluate 1.2 million studies on small-scale farming and post-production food systems in low- and middle-income countries

Recent developments have emphasized the need for agrifood systems to move beyond a production-oriented approach to recognize agriculture as part of a broader agrifood system that prioritizes livelihoods, social equity, diets, and climate and environmental outcomes. At the same time, the knowledge base for agriculture is growing exponentially. Using artificial intelligence and machine learning approaches, we reviewed more than 1.2 million publications from the past 20 years to assess the current landscape of agricultural research taking place in low- and middle-income countries. The result is a clearer picture of what research has been conducted on small-scale farming and post-production systems from 2000 to the present, and where persistent evidence gaps exist. We found that the greatest focus of the literature is on economic outcomes, such as productivity, yield, and incomes. There is also some emphasis on identifying and measuring environmental outcomes. However, noticeable data gaps exist for agricultural research focused on nutrition and diet, and gender and inclusivity.

Organization of the United Nations (FAO), likewise, supports the International System for Agricultural Science and Technology to collect the data needed to measure SDGs (Lowder et al., 2021). Solutions to domain-specific knowledge areas such as agriculture and livelihoods, environment and natural resource management, nutrition and health, and human capital and education are often found within the scientific literature. Expert knowledge, often in the form of scientific papers and other written analysis, is key to developing these solutions, as decisions need to be taken by integrating multiple information sources, incorporating accumulated experience, and weighing uncertainty. At the same time, the amount of available information is increasing exponentially-estimates suggest that human knowledge is doubling every 10-15 yearswhich makes it increasingly difficult to provide evidence-based interventions while avoiding the risk of confirmation bias or cherrypicking (Bornmann and Mutz, 2015;Bornmann et al., 2021).
Natural language processing (NLP) and machine learning can be highly effective at uncovering insights from large and representative datasets, helping us to make better use of the data in existing scientific publications. NLP is a branch of artificial intelligence that deals with the interpretation and manipulation of human language by computers. Machine learning is the use computers to learn and adapt without following explicit instructions by using algorithms and statistical models to analyze and draw inferences from patterns in data. Both machine learning and NLP approaches are designed to handle classification tasks with speed and accuracy, especially in datasets that lack metadata (Gil et al., 2014).
Recent work has allowed NLP to generate performing information extraction and summarization using relevant data from various sources. Such approaches have transformed how we can approach text-based classification. Pre-trained transform models such as Bidirectional Encoder Representations from Transformers (BERT), SciBERT and named-entity recognition with BERT are highly adept at capturing the context-dependent meaning of words even before additional training for other tasks that require expert input in the form of training data (Devlin et al., 2018;Beltagy et al., 2019;Luoma and Pyysalo, 2020). This can save significant time and money while delivering new insights.
Allowing for better understanding of the degree to which data and analyses are capturing systematic interactions is one of the most important features of ML and NLP approaches. This study reports on the use of machine learning to process and analyze 1.2 million summaries of past publications from a representative dataset of agricultural research focused on low-and middle-income countries. Its primary aim is the summarization of data to inform a series of open-ended questions that are difficult to answer because the data are scattered across millions of individual studies. These questions include: • Who are the user groups included within studies?
• What are the most-studied interventions and outcomes by researchers?
• What is the research output across low-and middleincome countries?  (Bornmann et al., 2021). We targeted CABI's CAB Abstracts in part because of CABI's mission to identify and aggregate research from low-and middle-income countries, making it among the best databases in the world for our purposes. Similar analyses to ours, focused on agriculture and regional specific agricultural components, such as rice research in low-and middleincome countries, indicates the suitability of CAB Abstracts for such analyses (Rafols et al., 2020;Amarante et al., 2021).
We reduced 1.3 to 1.2 million by removing duplicate citations to produce our final dataset for analysis. No further reduction, using more specific inclusion criteria, was initiated was this effort. Artificial intelligence-assisted techniques were used to summarize abstracts by the categories are shown in Figure 1. NLP for text extraction and large-scale machine-learning language models were used to model the data for tasks associated with the identification of study user population, interventions, outcomes, geography, and crop type, among other elements. A priori determination of the categories was done in consultation with the expert-assembled Commission on Sustainable Agriculture Intensification (CoSAI). The prioritization on some specific tasks by the CoSAI groups enabled a more focused approach for the machine-learning.
. . Machine-learning to identify agricultural interventions, outcomes, and study design types Identifying interventions, outcomes, study design types and more is normally undertaken during an evaluation of the evidence on a specific topic, such as part of an impact assessment or a systematic review, by domain experts looking through thousands of underlying original research papers. A well-trained machinemodel can accelerate the labeling of many of these tasks. This study further contributes to exploring the role of computation to accelerate evidence and impact synthesis work in agriculture and climate change scientific publication datasets Callaghan et al., 2021).
Training data assembled from collaborative coding from previous exercises, including more than 2,500 high-quality papers from across .

FIGURE
The various categories into which unstructured text summaries were analyzed using AI-assisted techniques.
the Ceres2030: Sustainable Solutions to End Hunger project, was used to enhance an artificial intelligence pipeline that supports classification and information extraction tasks (identified in Figure 1) for agriculture and related areas in international development (Acevedo et al., 2020;Baltenweck et al., 2020;Bizikova et al., 2020;Liverpool-Tasie et al., 2020;Nature, 2020a;Piñeiro et al., . /fsufs. . 2020;Porciello et al., 2020;Ricciardi et al., 2020;Stathers et al., 2020). In addition, the underlying models been continuously trained on tasks supporting diverse development literature as a result of other partnerships, including use in new domains such as water, hygiene, and sanitation, digital agriculture, and development and humanitarian assistance, and all of which required the identification of outcomes and interventions (Garbaro et al., 2020;Jardine, 2021;Porciello and Ivanina, 2021;. Unlike health and medical sector, which maintains an International Classification of Health Interventions through the World Health Organization (WHO), agrifood systems lack a similar standardized taxonomy of interventions. One most powerful structured collections of agricultural concepts, terms, definitions, and relationships-FAO's AGROVOC-defines an intervention simply as a "controlled price" (AGROVOC: AGROVOC Multilingual Thesaurus, n.d.). This definition is a sparse interpretation of the range of potential activities that can be used to support policies and programs to improve agrifood systems. Other organizations, including the OECD recommend expanding the interpretation beyond price interventions to include more agricultural, humanitarian and development sector activities (OECD, 2019).
We developed a proxy to inform how to approach an unstructured text corpus to identify literature that describes interventions but importantly, without necessarily using the term intervention. Training of the model for interventions included searching articles and summary data for synonyms of intervention and enhanced using Word2vec. Word2vec was chosen because of its more than decade-long history of performing NLP tasks to find syntactic and semantic similarities of words. Word2vec's shallow language model is appropriate for small and relatively heterogeneous datasets such as ours, and it has low computational costs, taking <1 day to learn high-quality word vectors from a 1.6-billion-word dataset. Similar models, such as Global Vectors (GloVe), could be used in conjunction with or instead of Word2vec with similar results, although training time might slightly increase (Sharma et al., 2017, p. 2). Using pre-trained Google News and Wikipedia Word2vec models, similar concepts to interventions for the agricultural domain were identified, including "program or programme, " "strategy, " and "government initiative" . Next, to surface all potential and specific interventions, we incorporated a semi-unsupervised model-based approach via coreference resolution models to support NLP tasks by linking noun phrases with entities in the text. A training dataset that broadly represented how interventions were described in the literature as technological, socioeconomic, and ecosystem service interventions was applied. More description about these categories is provided in the results section. Next, we sought to surface and label how more specific interventions, such as drip-irrigation or solar-irrigation, could be represented and labeled as part of a narrow cluster of interventions, such as "irrigation" interventions. Next, the model was trained to identify outcomes. Unlike interventions, there are standardized definitions for outcomes ( Table 1 in Results). The model was trained to detect when an outcome was mentioned and had a relationship to narrow classes https://www.who.int/standards/classifications/internationalclassification-of-health-interventions from the intervention. A single example consists of a sentence, an intervention from the ontology and/or plant, animal product from the AGROVOC dictionary, and an outcome from the sentence. When the model detects an outcome is connected with a particular intervention in the context of a sentence, it labels the citation with the appropriate outcome based on the general definition.
Both rule-based and transformer-based models were used for this task with similar results. A rule-based support-vector machines (SVM) was used in a semi-unsupervised approach to organize studies according to NLP-derived intervention, outcome, and study design type taxonomies. An SVM-k nearest neighbors-stochastic gradient boosting approach was used for classifying specific interventions, where all the supporting content (in this case, summary data) is examined in a vector space. The SVM is a supervised classification algorithm that learns by example to discriminate among two or more given classes of data, and they work well with high-dimensional data especially for smaller datasets. In addition, BERT-based models are designed for sentence level and token-level tasks and are useful for identifying relationships in small pieces of text. BERT models including base BERT, Roberta, Albert, SciBERT, and DistillBERT were tested. DistilBERT Named Entity Recognition (NER) uses the BERT architecture but performs knowledge distillation during the pretraining, allowing for lighter, faster and cheaper transformer model, and reduces the size of a BERT model by 40%. Due to the size of the labeled dataset, models were trained by freezing all layers (which is responsible for encoding the text) except the last two layers (where classification occurs).
Finally, study design types also lack common definitions. These were labeled using expert data and the transformer model SciBERT, which has been pre-trained on scientific articles (Beltagy et al., 2019). For other tasks, text extraction models, including pre-trained spaCy, specialized dictionaries, and ontologies of AGROVOC and the National Agricultural Library Thesaurus, were used to identify and label geography, plants, animals, diseases, research leadership and funding, and study populations.

. Results
One of the most useful ways to report the findings of this analysis is through an evidence gap map (Figure 2), a visual and interactive tool that provides an overview of all evidence collected on a particular issue (Vincent et al., 2022). Evidence gap maps enable policy makers and practitioners to review findings, explore the quality of the existing evidence, and make evidence-based decisions in international development policy and practice. They also identify key "gaps" where little or no research has been published (Snilstveit et al., 2016).
The key components of an evidence gap map are interventions and outcomes. The evidence gap map identifies the most frequently studied interventions as determined by a threshold of at least 10,000 articles and categorizes them into one of three broad categories of agricultural research (socioeconomic, technological, and ecosystem services). Importantly, an evidence gap map does not prioritize or claim there is a single intervention that is "a silver-bullet" to support agricultural development outcomes. Rather, the intention is to surface volumes of research and where more, and less, emphasis has been placed.
. /fsufs. . Technological interventions constitute the use of practices and technologies (both direct and indirect) to support agricultural production and food systems (Acevedo et al., 2020;FAO, 2022a,b). Indirect uses include underlying technology such as biotechnology to improve seeds, whereas direct would be use of irrigation, mechanization, and inputs such as fertilizer. Socioeconomic interventions include market and finance interventions that contribute to accessing markets, credit or other financial products .

FIGURE
An intervention and outcome evidence gap map identifying the most frequently studied interventions and associated outcomes.
or investments in value chain development, as well as interventions that increase knowledge or awareness, transfer skills, and build capacities such as education (Liverpool-Tasie et al., 2020). This category also includes policy and government interventions, such as government, funder, or other organizational programs and policies to support farmers and agri-food system actors through incentives, or direct support, and includes interventions to improve inclusion of women and other marginalized groups (Barrett et al., 2020).
Ecosystem services interventions focus on improving ecosystem services with regulating and supporting functions such as clean air, nutrient cycling, pollination, erosion control, carbon storage and more (Piñeiro et al., 2020). Additional analysis can be conducted to further sub-divide the categories for additional, discrete analysis. The evidence gap map in Figure 2 shows the frequency of interventions per outcome, expressed as a percentage across the literature. For instance, over 50% of plant breeding interventions .
in the literature are associated with outcomes related to economic growth, whereas 11-20% are associated with nutrition outcomes, 21-30% with environmental outcomes, and <10% with women's empowerment and inclusion. Table 1 provides outcome descriptions and definitions. The highest reported outcome is economic, such as productivity, yield, and incomes, in the literature. This reflects the fact that agricultural research and innovation literature has been largely focused on improving productivity of a small number of crops rather than focusing on other important aspects of crop research, such as dietary diversity (Serraj and Pingali, 2018). Some emphasis has been placed on on identifying and measuring environmental outcomes, including water use and health, across many of the intervention categories, especially those focused on ecosystem services.
Where the data gaps are more noticeable are regarding agricultural research focused on nutrition and diet, and women's empowerment and other inclusivity outcomes mentioned in the literature, such as increased knowledge obtained through training and education programs. For the latter, the gaps are widespread across all intervention categories. Figure 3 provides a regional level overview of the publication trends focused on specific crops mentioned in title and abstract data. Table 2 provides a breakdown of the specific crops included in each category and their inclusion was determined a priori through consultation (as referenced in the introduction). Generalized terms such as cover crops, livestock feed crops, container plants, bee plants, beverage crops, and oils were excluded from the mapping because it was unclear from the summary what crops they referred to, and because they totaled fewer than 25,000 mentions. Each study was labeled with multiple labels, meaning that more than one relevant label could be applied. For instance, if a study focused on wheat, maize, and rice in Vietnam and Thailand, then the study would be counted as "1" in all subsequent categories.
China, Brazil, and India lead the way in publishing research outputs, but different countries and regions come into focus depending on the target crops, as highlighted by the maps in Figure 3. Perhaps as expected, countries that are home to a major international research center, such as the International Maize and Wheat Improvement Center in Mexico or the International Rice Research Institute in the Philippines, have a higher prevalence of research related to the specific crops being studied. Other grains that are important for food security, such as millet and sorghum, have a smaller cumulative total of around 10,000 articles.
The findings on study design types by research categories (Figure 4) show research activities that report on non-human experiments, such as field trials, laboratory, and simulation studies. A total of six labels were created to identify study population types: field study, experimental study, simulation/modeling study, narrative/review study, laboratory study, and observational studies. Each citation received only one study type. The categories along the Y axis are CABI Codes. CABI Codes is an index of 23 major subject areas related to the area of the citation, each with their own set of subcodes (https://www.cabdirect.org/help/about-cabicodes.html). CABI codes are added by the vendor when an article is included in CAB Abstract database. This provides an existing, manually curated index of research topics that does not rely on machine-learning. The subject area of agricultural economics has the largest number of observational studies, followed by field crops, meteorology and climate, and water resources.
Finally, a multi-label approach to capture information about the study population communities, including when studies mention descriptions about age, sex, affiliation with indigenous communities or other, and agricultural workers, including farmers. Despite a generalized, multi-labeling approach, the data collection and reporting on user populations is very weak. Only about 25% of studies reported any information about a population of study. Though there may be widespread acknowledgment that women, farming communities and others in the agricultural workforce face significant challenges, there is a risk they will be undermined in these types of global assessments by weak data collection practices regarding demographics and other specific descriptions and/or underreporting in the literature (Teeken et al., 2018).

. . Prioritizing research gaps
The way we think about agriculture is currently undergoing a major shift away from a focus on production and toward a broader understanding that puts agriculture in the larger context of an agrifood system with complex interactions between food production, processing, consumption, nutrition, social change, and climate change (Barrett et al., 2020;Lipper et al., 2020). This shift implies a need to rethink the role of agricultural research and development efforts, and push for innovations that go beyond productivity. There is a corresponding urgency to identify priority investments (Reardon et al., 2019;Laborde et al., 2020). To do so, however, we must have an adequate and accessible evidence base for understanding agricultural innovations and their potential in the context of a transformation.
Integrated approaches across interventions are more effective in achieving gains across the entire food system. Therefore, the relative scarcity of research emphasizing diet, nutrition, and women's empowerment relative to the long-standing priorities of productivity and yield in agricultural research should not necessarily lead us to conclude that some areas of research only need to "catch up" to others. Simply focusing on expanding the literature in one of the relatively under-researched areas will not address the yawning gap of evidence on the interactions that occur across various outcomes.
However, not all areas where there is a dearth of research can be treated equally or with the same urgency. There are many areas of research where we have gaps in the evidence on the impact of interventions on specific outcomes (Figure 3) but identifying where significant trade-offs between outcomes can arise from interventions is key in the context of analyzing the food system and its interactions (Fuso Nerini et al., 2018;Kroll et al., 2019). For example, the lack of research on fruits, vegetables, and more nutritious grains such as millet and sorghum (Figure 3), as well as accompanying postharvest storage to ensure safety and reduce loss, is a gap in our understanding relevant not only to improving diets and addressing micro-nutrient deficiencies, but to gender and inclusivity, given the high rates of female participation in horticultural and post-harvest activities (Kennedy et al., 2017;Nordhagen, 2021).
There is too little data being reported in agrifood systems literature about study populations, and the impacts and uptake of innovations across small-scale farmers and their communities. Better identification of relevant characteristics of the people and .

FIGURE
A country-level look at research production across specific crops.  communities involved in agricultural activities is essential to understanding the outcomes of interventions and the interactions that arise across different outcomes. Part of the issue is the extremely ambiguous descriptions of farmers and agricultural workers. These descriptions rarely include contextual clues about the type or size of farm they work on. Similar gaps were reported in another evidence analysis, which found that only 2-3% of studies across a portfolio of scoping reviews reported on the conditions and interventions of farmers in low-and middle-income countries (Nature, 2020b). Given that the emphasis of SDG 2 focuses on the conditions of poor farmers in low-and middle-income countries, high-impact, applied research to identify and report on successful programs across all outcomes in low-and middle-income countries is urgent. Equally important for future of research is the capture of social equity and sociodemographic details that could underscore how barriers are systematic for some communities and not for others. Socioeconomic status, race, class, and gender can create interdependent systems of discrimination that reinforce the exclusion of some groups-particularly, but not only, women-from the benefits of certain programs and innovations. The ability to look at social factors as a system is essential to avoid tendencies to overgeneralize and assign certain characteristics to entire groups, such as elderly, youth or women (Sumberg and Hunt, 2019). A recent scoping review focused on digital agriculture identified that fewer than 30% of all studies reported socioeconomic and demographic data . This shortcoming is of particular concern in the context of assessing multiple and potentially interacting outcomes from agricultural research. In a 2020 review of literature on factors influencing the adoption of sustainable agriculture, farmer characteristics-including asset levels, experience and risk preferences-were a key factor in explaining farmers' behavior, particularly where there were potential trade-offs between environmental and economic outcomes (Piñeiro et al., 2020). In discussing the reasons for the lack of progress in transforming smallscale agriculture, Woodhill et al. (2020) cite a lack of understanding of the diversity of characteristics and contexts of small-scale farmers is reported as a major factor. Here, again, the issue of multiple and potentially competing outcomes from agricultural change was important. As we look toward the future of research prioritization, equity outcomes need to become more pronounced (Davis et al., 2022;Laderchi et al., 2022).
In this respect agricultural and food systems studies fall well behind other disciplines, such as medicine and health. Coordinating bodies in health and medicine, such as Cochrane draft guidance and minimum standards for synthesis conduct, develop methodologies and training capacity, and commission and publish high-quality reviews. The absence of such coordination and synthesis in agricultural sciences has contributed to the evidence gaps mapped in this study. These gaps should no longer be ignored. Simply focusing on expanding the literature in one of the relatively under-researched areas will not address the yawning gap of evidence on the interactions that occur across various outcomes with interventions into any one piece of the system. Assessing progress on the myriad of impacts of what, where, when and why are often commissioned individually by .
/fsufs. . donors with little opportunities for coordination. Moreover, despite the existence of gaps in data collection, such as the absence of sociodemographic data about farmers that we have highlighted above, the lack of an organizing body means that there currently exists no group to champion for long-term change in research practices, methodologies for synthesis conduct, and data collection. The aim of this study is to uncover relevant insights across primary studies and used only summary title, abstract and other available metadata. However, what authors choose to emphasize in the title, abstract and other summary data is influenced by various editorial decisions between themselves and the journals publishing the materials. For instance, some journals may ask authors to refrain from mentioning too many details in the abstract, such as the user population of study, countries of focus, or specific plants. Access to the full text is needed to evaluate the claims made in the summary data, such as whether the interventions and outcomes recognized in the abstract are substantially supported with high-quality data in the study (Garbaro et al., 2020;Porciello and Ivanina, 2021;. Evidence from the Covid-19 Open Research Dataset (CORD-19) demonstrates the value obtaining copyright and permissions clearance from commercial publishers to support text mining and NLP research on scientific papers. CORD-19 is an open access collection of more than one million scientific papers published between March 13, 2020-June 2, 2022 related to coronavirus with the full-text available for text-mining of nearly 370 K papers (Wang et al., 2020). The opportunity to read and rapidly discover insights from primary scientific research during Covid-19 is useful to all scientists and policy-makers, and CORD-19 computational tools for text-mining delivered additional, rapid insight on internationally collaborative work, and the contributions of funders, countries, institutions, and fields throughout the pandemic (Wagner et al., 2022).
A demand-driven approach to obtaining access to critical research is relevant for the agrifood community considering the current, global food crisis (Laborde and Glover, 2022). For instance, recent research of over 1.2 million children in 44 low-and middleincome countries suggests that experiencing the current crisis of food inflation increases both the risks of stunting and wasting in children under 5, including infants, as well as decreased diet quality for older children (Headey and Ruel, 2022). Greater visibility of critical agrifood research, complemented with computation tools to extract and classify "what works" and major gaps in the evidence base is urgently needed to help policymakers implement relevant policies that may mitigate disastrous consequences, especially for vulnerable populations.

. Conclusion
Using machine-learning to analyze and quantify data gaps in agricultural research allows for greater understanding of the degree to which data and analyses are capturing systematic interactions. These approaches are current unavailable through other means, including expensive subscription databases. This approach to define important concepts like interventions can be especially useful in disciplines like agriculture and food systems, where well-coordinated, standardized evidence synthesis is lacking. Machine learning approaches enable us to perform close readings of a large, representative dataset and provide descriptive details that can be used to inform research agendas and prioritization. Studies like this are necessarily limited in the observations and analysis based on what we can glean from summary data, given that full-text analysis of more than one million papers requires extensive processing time. In this study, the capture mentions of interventions and their outcomes presents a useful "birds-eye view" for future interrogations of the data, but both access and additional evaluation of the underlying studies is needed to support whether the identified interventions and outcomes are consistent with the findings of each study. Still, such approaches allow opportunities to track research over time to create a global monitoring and evaluation framework.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: CAB Direct, GitHub.

Author contributions
JP oversaw study design, data analysis and methodology, and manuscript development. MI provided coding and computation support. LL contributed to data review and manuscript development. All authors contributed to the article and approved the submitted version.

Funding
This research was commissioned by the Commission for Sustainable Agriculture Intensification (CoSAI) and supported by funders contributing to the CGIAR Trust Fund.