Modeling Spatial Interaction in Stochastic Frontier Analysis

We compare farm level efficiency rankings derived from non-spatial and a variety of spatial model specifications that account for unobserved heterogeneity in both the production and the efficiency sides of the stochastic frontier model in an empirical application on rice farming in the Philippines. We show how not accounting for unobserved spatial heterogeneity affects efficiency estimates and farm efficiency rankings. When not accounting for unobserved spatial heterogeneity efficiency, models show farms to be relatively more inefficient than they actually are (i.e., once unobserved spatial heterogeneity is incorporated in the models). More importantly from a policy perspective, the rankings of the farms in terms of efficiency are altered once unobserved spatial heterogeneity is incorporated in efficiency models. We recommend the use of unobserved effects in both production and efficiency within the stochastic frontier analysis framework to avoid making any misleading recommendations to farmers and policymakers.


INTRODUCTION
Agriculture in developing countries is challenged by a growing scarcity of resources, which imposes the need for efficient resource allocation to increase productivity. The Asian rice production sector is a particular example where decision for input allocation is critical in the midst of growing resource scarcity and the need for improving productivity to ensure food security. There is growing literature devoted to the analysis of farmers' technical efficiency in developing countries using a stochastic frontier analysis approach (Idiong, 2007;Balde et al., 2014;Quilty et al., 2014;Michler and Shively, 2015, etc.). However, most of the technical efficiency literature has ignored unobserved spatial heterogeneity. Neglecting unobserved spatial heterogeneity in technical efficiency analysis may lead to coefficients that are inefficient or biased (Anselin, 2001). In agriculture, unobserved spatial heterogeneity can arise from farmers emulating each other, level of infrastructure, or climatic and topographic conditions (Areal et al., 2012a). In some cases, spatial information can be incorporated into the analysis combining the use of different data sources including climatic and topographic maps and location of farms. For instance, Gadanakis and Areal (2020) show how incorporating rainfall and the length of the growing season in the technical efficiency analysis of cereal production matters. However, incorporating information on social interaction is more challenging. There are several reasons why farm and household level networks may influence technical efficiency. Farmers may emulate each other because they may receive the support of agricultural extension agents in farming communities. However, there is overwhelming evidence that farmers still rely on their social networks for information on input allocation, management practices, etc. (Case, 1992;Foster and Rosenzweig, 1996;Bandiera and Rasul, 2006;Langyintuo and Mekuria, 2008;Conley and Udry, 2010;Maertens and Barrett, 2012;Banerjee et al., 2013;Ward and Pede, 2014;Nakano et al., 2018). In this regard, farmers who belong to the same agro-ecological system and share a common resource pool depend on the regulations set within the resource pool for their farming management. For farmers in irrigated areas who share water within a water users' group, this kind of dependency may emerge more strongly among them as compared to the rainfed farmers who farm more independently. Lastly, farmers who belong to the same water users' group in irrigated areas may face similar institutional shocks and regulations. Notably, they may have formed similar social preferences through collective irrigation management (Ostrom, 2000;Tsusaka et al., 2015). In this manner, farmlevel spatial dependency may arise though socio-economic, agroclimatic, or institutional similarities and could influence farmers' technical efficiency.
New developments in the field of spatial econometrics have made it possible to examine the spatial effects in the stochastic frontier analysis (SFA) (Areal et al., 2012a;Glass et al., 2013Glass et al., , 2014Tsionas and Michaelides, 2015). While the effects of not incorporating spatial dependency in terms of inefficiency or the stochastic production frontier on SFA results have been shown (Carvalho, 2018;Pede et al., 2018;Tsukamoto, 2019) and a way to incorporate spatial correlation in both the noise and inefficiency terms has been developed by Orea and Álvarez (2019), there are no clear model specification strategies on where and how the spatial dependency should be modeled. Using an empirical application on rice farming survey data in the Philippines, we compare the distribution of farm level efficiency in four alternative models to illustrate what strategy could be used: (1) non-spatial model, (2) spatial model where the spatial dependency is only modeled in the output Y (SAR model), (3) spatial model where the spatial dependency is only modeled in the errors [as in Areal et al. (2012a) and Pede et al. (2018)], and (4) 2 and 3 combined (SARAR model) (Billé et al., 2018). Model 2 (SAR) captures latent influences associated with production (e.g., unobserved climatic conditions of the farm neighborhood and soil characteristics). Model 3 captures spatial effects associated with efficiency, and Model 4 (SARAR) separates these spatial dependencies into those associated with factors outside the control of the farmer (i.e., spatial dependencies between farm productions that are not production inputs) and those associated purely with management of the farm (managerial practices or sharing of information, for example). Results show the implications using each of the models on efficiency scores on management and policy recommendations.

MATERIALS AND METHODS
We use panel data collected by the International Rice Research Institute (IRRI) during four consecutive rice seasons from 2009 to 2011 in the Bohol province in the Philippines from 496 rice farmers with farms within a relatively close distance (maximum distance between farms is approximately 13 km); 205 and 291 observations are from rainfed and irrigated ecosystems, respectively. A thorough description of the survey questionnaire is available in JICA (2012). Descriptive statistics of farms and household characteristics were presented in Tsusaka et al. (2015) and Pede et al. (2018). The short distance between farms allows capturing the effect of unobserved spatial heterogeneity, including small networks of relatively close farmers (i.e., spatial dependency associated with information sharing).
The SFA literature addressed the issue of unobserved heterogeneity through panel estimators (Kumbhakar and Lovell, 2000). We follow and expand a one-step Bayesian procedure described in Areal et al. (2012a) and applied by Pede et al. (2018) to estimate models 2-4. We expand Pede et al. (2018) by considering the SARAR model (model 4 below) and examining the impact of model selection on farm level efficiency rankings. We specify the non-spatial model as follows: where y it is the production of farm i for i = 1,. . . ,N at season t for t = 1,. . . ,T; x it is the (1×K) vector of inputs of production (seed -kg-, plot size -ha-, labor -man-days-, fertilizer -kg-and capital -PHP-) 1 and its combinations for farm i at season t following a translog functional form for the production function (i.e., x is in logarithmic form); z it is a 1×M vector of M nonstochastic environmental variables for farm i at season t that includes farmer's level of education, household size, household head being a female, and remittance; p it is a 1×(T-1) vector for T-1 dummy variables accounting for seasons 2-4; β, θ, and ψ are unknown parameter vectors to be estimated; v it is a random error; and u i is the farm inefficiency, which is assumed to be constant across the seasons. Stacking all variables into matrices we can describe the following spatial extensions of Model 1.
where W is a spatial row normalized weight matrix with diagonal elements being zero, ensuring that all elements in the spatial weight matrix W are non-negative and efficiency estimates are in the unit interval (Kutlu, 2018;Kutlu et al., 2020a); γ and ρ are spatial coefficients, assumed to be between 0 and 1, and associated with production and efficiency, respectively; u andũ are latent variables whose distributional form is unknown. Different weight matrix specifications can be used, and its selection is arbitrary (Areal et al., 2012a). We use a distance-based N×N weight matrix W with elements w ij defined as: where w ii = 0 precludes direct prediction of Y; d ij is the distance between farms i and j (in km); and s is the cut-off distance around a given observation over which other observations are likely to be dependent. Equation 5 represents this by showing an inverse relationship between the spatial weighted matrix value and distance between farms. The cut-off distance s indicates the point at which this negative relationship is decreasing relatively slowly. Since the cut-off point is unknown a priori, a number of different cut-off points varying from 100 to 1,000 m are used (Areal et al., 2012a;Pede et al., 2018). The distance between farms is the Euclidean distance calculated using the plot coordinates. We used a translog functional form for the models estimated: Model 3 : ln We imposed monotonicity and concavity at the mean of the data as well as inequality conditions required for inefficiency to be non-negative. A Bayesian approach was used to estimate models 1-4. For model 1 we assume a normal distribution with mean 0 T and covariance matrix h −1 IT where h is the inverse of the variance; x it are explanatory variables for individual i in period t; v it and v jt are independent of one another for i = j; and u i and v jt are independent of one another for all i and j. The conditional likelihood function is proportional to a normal distribution: FIGURE 1 | Spatial dependence at different cut-off distances for model 2 in irrigated farms and rainfed farms.
Frontiers in Sustainable Food Systems | www.frontiersin.org where y * i = y i + u i 1 T . We complement the conditional likelihood function with priors for β, h, µ −1 u , and u. We use an independent normal-gamma prior for the coefficients in the production frontier and the error precision (Koop, 2003). For the inefficiencies, we use an exponential prior distribution of p ũ i |µ −1 u ∝ exp −µ −1 uũ i . The prior for µ −1 u is assumed to be: where r * is the median of the prior distribution. Model 2 differs from model 1 by adding Wy as an explanatory variable to the production function. The conditional likelihood function is proportional to a normal distribution: with the prior for γ following a normal-gamma distribution.
We follow the Bayesian approach used by Areal et al. (2012a) and Pede et al. (2018) to estimate Model 3. Model 4 is an extension of model 3 where we add Wy as an explanatory variable to the production function. The conditional posteriors are obtained from the joint posterior distribution p β, h, ρ, µ −1 u , u|y that includes an indicator function I (ρ ∈ [0, 1]) where I(·) = 1 is ρ ∈ [0, 1] and I(·) = 0 otherwise. The conditional posterior forũ i in model 3 and 4 (making Wy part of x in the production side and γ part of the vector β, for simplification) is: T . As pointed out by Areal et al. (2012a), this is not a recognizable form and Metropolis-Hastings algorithm is used (Metropolis et al., 1953;Hastings, 1970). We refer interested readers to Areal et al. (2012a) and Pede et al. (2018) for further details on the Bayesian approach used.
Once farm level estimates for u are obtained, farm level efficiencies are usually calculated as exp (−u i ). However, in SAR models efficiency estimates need to be corrected (Glass et al., 2016;Kutlu, 2018). There are two approaches to obtain the corrected farm efficiency levels that can be followed. Glass et al. (2016) proposed to calculate the corrected farm efficiency level as (I N − ρW) −1 u t whereas Kutlu (2018) proposed (I N − ρW) −1 exp (−u t ), with the former, as pointed out by Kutlu (2018), potentially sensitive to outliers. Therefore, we use the approach proposed by Kutlu (2018) to calculate the corrected farm level efficiency levels. Figures 1-4 show the mean estimate for the spatial parameters γ and ρ for the different models analyzed, which capture unobserved spatial heterogeneity associated with production and efficiency respectively, at different cut-off distances between FIGURE 2 | Spatial dependence at different cut-off distances for model 3 in irrigated farms and rainfed farms.

RESULTS
Frontiers in Sustainable Food Systems | www.frontiersin.org 100 m and 1,000 m. Results show a decrease in unobserved spatial heterogeneity associated with production, γ, and efficiency, ρ, respectively as the cut-off distance increases.
For Model 2, where the spatial dependency is only modeled in the output Y (SAR model), at 100 m the spatial dependence parameter associated with rice production γ is 0.017 and 0.040 for irrigated and rainfed farms, respectively (see Table 1 with results). Comparing the posterior conditional distributions for γ for rainfed and irrigated farms shows that the probability of spatial dependence parameter associated with rice production γ (100 m) being greater in rainfed than irrigated farms is 95%. The higher spatial dependency found in rainfed farms may be due to similar climatic characteristics at neighborhood levels being more important for rainfed farms in determining rice production   than it is for irrigated farms. Hence, accounting for such unobserved heterogeneity is relatively more important in rainfed farms, although their absolute effect in production estimation may be small. We also found evidence of spatial homogeneity being present in the efficiency part of the stochastic frontier (e.g., neighboring farmers conducting similar management practices by sharing information), captured by ρ in model 3. This spatial dependency effect was found to be relatively  higher in irrigated than rainfed farms. Comparing the posterior conditional distributions for γ (100 m) for rainfed and irrigated farms shows that the probability of spatial dependence parameter associated with rice production γ being greater in irrigated than rainfed farms is 92%. This is expected since irrigated farmers do work more closely than rainfed farmers do. Farmers in irrigated areas share water within a water users' group under similar institutional shocks and regulations, forming similar social preferences through collective irrigation management (see Ostrom, 2000;Tsusaka et al., 2015). This creates a relatively stronger spatial dependency for them than for rainfed farmers who farm more independently. Table 1 shows the results for the estimates for spatial parameters γ and ρ, which capture unobserved spatial heterogeneity associated with production and efficiency, respectively. Results comprise the mean of the coefficient and the  95% coverage posterior region in brackets. The interpretation of the Bayesian 95% coverage posterior (a, b) is that, according to our data and model, the parameter is between a and b with a 0.95 probability. Figures 5, 6 show the technical efficiency distribution (Kernel density) for the non-spatial model and the spatial Model 2 at different distances for rainfed and irrigated farms, respectively. Modeling the spatial dependency in the output shifts the efficiency distribution of rainfed farms to the right, which indicates the presence of spatial homogeneity for rainfed farms at 100 m distance. However, accounting for spatial dependency in the output of irrigated farms does not significantly alter the farm efficiency distribution from the one obtained by the nonspatial model. Accounting for spatial dependency in irrigated rice production does not influence the efficiency results. This result is due to the spatial dependency found (i.e., unobserved spatial  heterogeneity associated with rice production) being relatively small. The reason for it is that unobserved conditions that may affect irrigated rice production may not vary much within 100 m (also note that farms are located in a relatively close proximity with maximum distance between farms being 13 km). Figures 7, 8 show the farm technical efficiency distribution (Kernel density) for the non-spatial model and the spatial Model 3 for rainfed and irrigated farms at different cutoff distances from 100 to 1,000 m. The figures show results obtained from the non-spatial model and the spatial Model 3 that accounts for unobserved spatial heterogeneity associated with farm technical efficiency for rainfed and irrigated farms, respectively. Accounting for spatial dependency in the efficiency term shifts the efficiency distribution to the right, meaning that part of the inefficiency found in the non-spatial model is due to unobserved spatial homogeneity. In this case the estimated parameter capturing spatial homogeneity present, ρ, which may capture neighboring farmers sharing information about FIGURE 12 | Technical efficiency ranking differences of models 2-4 compared to Model 1 for irrigated farms. management practices, has a bigger incidence in the technical efficiency estimation than unobserved spatial heterogeneity associated with production, γ. Figures 9, 10 show the farm technical efficiency distribution for rainfed and irrigated farms at different cut-off distances from 100 to 1,000 m for the non-spatial model and Model 4. The figures show that controlling for both unobserved factors affecting production (e.g., climatic and soil characteristics) and efficiency, such as managerial aspects (e.g., farmers sharing of information), also shift the technical efficiency distribution to the right. In this case, we obtain close but slightly higher estimates for both the spatial dependence parameter associated with rice production γ and spatial dependence parameter associated with rice production efficiency ρ than when using models 2 and 3 where these are estimated separately.
As pointed by Areal et al. (2012b) and Areal et al. (2018) it is important to investigate the differences in farms' rankings between the different models used. Once the functional form of the frontier has been altered (e.g., by incorporating previously omitted information such as spatial information) individual efficiency results may well change. Figures 11 and 12 show kernel densities for the change in rankings for the rainfed and irrigated farm models studied, respectively. We found that for rainfed farms the ranking of farms varies on average five positions when using Model 2 and 4 positions for Model 3 at 100 m cut-off distance, as opposed to Model 1 (non-spatial) with some farms ranking varying up to 60 ranking positions. Using Model 4 makes these ranking differences wider. The average variation in ranking is over 9 positions, with some farms varying up to 71 positions.
For irrigated farms we find that there is less variation in the changes in ranking of farms for Model 2 compared with rainfed farms but more variation for Model 3 and Model 4, with 25 farms changing more than 20 positions and one farm dropping 99 ranking positions under Model 4.

DISCUSSION
Accounting for spatial dependency in SFA avoids biases associated with neglecting information on spatial unobserved heterogeneity. Importantly, spatial dependency may arise from both the production and the efficiency side of SF models; therefore, the use of flexible models which take both into account is recommended. Usually crucial information on climatic, topographic, soil, and social conditions are disregarded in efficiency analysis studies. Rarely is this information included in farm production surveys. Sometimes, relevant information such as climatic data can be collected from different data sources and combined with production information at the farm level using geographical information. However, some other information may be more difficult to obtain or be unavailable (e.g., information on whether information is shared between farmers). Hence, incorporating spatial dependency is especially important to control for any unobserved spatial heterogeneity that may be present.
We have shown the need for incorporating unobserved spatial heterogeneity into the production and efficiency side of stochastic frontier analysis. We have used these models to show a case were the researcher faces uncertainty about the model to use. Of course, a number of alternative models are possible depending on how much information there is about the nature of the spatial effects and how the spatial effects are specified. For instance, we could specify the spatial effects in a model in a way that captures neighbor externalities [a spatial lag of X model (SLX)], where the output is associated with the average of neighboring inputs. Another important issue worth pointing out is the choice of specification for the spatial weight matrix W, which defines the spatial relations between the farms. Spatial weight matrices can be categorized into three main groups: distance-based (as the one used here) spatial weight matrices, boundaries-based spatial weight matrices, and combined distance-boundariesbased spatial weight matrices. Since spatial relations are unknown, a priori research must make assumptions about them. These are often limited by the type of information on the geographical location of farms (e.g., coordinates, district, region). Importantly, as found by Areal et al. (2012a), the way in which the connectivity matrix is specified may have an impact on the levels of efficiency obtained and this needs to be acknowledged and/or tested (e.g., by using different specifications for the connectivity matrix).
We showed how neglecting unobserved spatial heterogeneity can have important effects on (a) how a sector can be categorized in terms of its level of efficiency and (b) how farms can be wrongly targeted for policy support. Regarding the effect on how a sector can be categorized in terms of its level of efficiency, not accounting for unobserved spatial heterogeneity efficiency may lead to wrongly concluding that farms are relatively more inefficient than they actually are. This may have broad implications if a particular sector (e.g., rice producers) is considered to be inefficient and therefore resources are allocated to it through policy action (e.g., financial support). Wrongly targeting farms for policy support is a consequence of the effect that not accounting for spatial unobserved effects may have on farm technical efficiency rankings. The fact that the ranking of farms varies once new relevant information is incorporated into the efficiency analysis is important from a policy perspective. For instance, this is relevant in cases where policymakers need to identify farms in need of support. Identification of farms in need of support using efficiency models that do not account for unobserved spatial heterogeneity (at least) may lead to targeting the wrong farms. We advocate for incorporating both spatial unobserved effects in production and efficiency in SFA to avoid any negative implications for efficiency analysis and any recommendation to farmers and policymakers derived from them.
It is worth pointing out that we have focused on the effects of neglecting unobserved spatial heterogeneity in SFA models, but other types of heterogeneity (non-spatially dependent) may be present and are also important to account for in SFA models. In this regard, SFA model extensions to account for unobserved heterogeneity, endogeneity, time-varying inefficiency, and timeinvariant individual effects have been developed (Greene, 2005a,b;Wang and Ho, 2010;Kutlu and Tran, 2019;Kutlu et al., 2020b). Also, here we assumed inputs are exogenous variables in the SFA models and control for endogeneity derived from omitted/not available explanatory variables to capture spatially dependent latent influences. However, endogeneity may occur when there is non-spatially dependent correlation between the inputs and statistical noise or inefficiency. Recent literature about heterogeneity and endogeneity in SFA has developed other ways to account for these issues (Amsler et al., 2016;Lai and Kumbhakar, 2018;Kutlu and Tran, 2019).
Finally, taking into consideration the issues highlighted above, we advocate for future model developments focusing on integrating approaches to account for both unobserved spatial and non-spatial heterogeneity and endogeneity.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available upon request to the International Rice Research Institute (IRRI).

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by International Rice Research Institute. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
FA and VP contributed to the data analysis and drafting the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The financial support for the survey data collection was provided by the Japan International Cooperation Agency (JICA) and Japan International Research Center for Agricultural Science (JIRCAS). VP's time on this research was supported by the RICE CGIAR Research Program.