Original Research ARTICLE
Foodsheds in Virtual Water Flow Networks: A Spectral Graph Theory Approach
- 1Ensaras, Champaign, IL, United States
- 2Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
A foodshed is a geographic area from which a population derives its food supply, but a method to determine boundaries of foodsheds has not been formalized. Drawing on the food–water–energy nexus, we propose a formal network science definition of foodsheds by using data from virtual water flows, i.e., water that is virtually embedded in food. In particular, we use spectral graph partitioning for directed graphs. If foodsheds turn out to be geographically compact, it suggests the food system is local and therefore reduces energy and externality costs of food transport. Using our proposed method we compute foodshed boundaries at the global-scale, and at the national-scale in the case of two of the largest agricultural countries: India and the United States. Based on our determination of foodshed boundaries, we are able to better understand commodity flows and whether foodsheds are contiguous and compact, and other factors that impact environmental sustainability. The formal method we propose may be used more broadly to study commodity flows and their impact on environmental sustainability.
Economic and environmental historians have traditionally considered the flow of natural resources from hinterland to metropolis (Cronon, 1991), but there is growing specialization, interconnection, and flow among all regions within nations and further among nations of the world. A data-driven, systems-level understanding of the food–water–energy nexus is fundamentally linked to these flows. From a water perspective, food and energy systems can be thought of as users of the resource; from a food perspective, water and energy can be thought of as inputs to production; from an energy perspective, water is a required resource and food is a kind of output.1
In understanding flows of water, it is useful to define watersheds as partitions of land into distinct drainage basins where all incoming water has a common convergence point. These are defined from topography and physical geography. In understanding flows of energy, it is common to consider regional partition structure, e.g., the regional transmission organization of power grids, which are determined largely from political concerns (Hughes, 1985).
What about flows of food? The notion of a foodshed has been put forth as a geographic area from which a given population derives its food supply (Hedden, 1929; Peters et al., 2009; Horst and Gaolach, 2015), but no method to formally define partition regions of land surface as foodsheds has been made precise. We propose a purely data-driven approach to do this based on agricultural trade, with the intention of then interpreting the results in terms of geographic, political, and economic factors that may have influenced the foodshed regions that emerge. We hypothesize that unlike watersheds or energy regions, foodsheds will be determined by both social and natural forces.
We draw on spectral graph theory for directed networks (Chung, 2005) to develop a precise definition. In particular, we use spectral graph partitioning for directed networks (Gleich, 2006; Malliaros and Vazirgiannis, 2013) constructed from agricultural trade data as the basis for defining foodsheds. This spectral approach to graph partitioning is chosen since it is natural, in the sense of capturing the costs of physical flows due to intimate connections with metric embedding. Other approaches to graph partitioning, such as those based on random walks, are not natural in this sense.
Since food is multidimensional, unlike water and energy, which are essentially scalar commodities, we need a method to commensurate flows of food and be able to measure them from a unified analysis of a wide variety of disparate data sources. Notwithstanding other possibilities, we accomplish this using the concept of virtual water flow, which estimates the amount of water embedded in the flow of food.2 That is, the water resources used to produce food commodities are virtually transferred alongside, in a virtual water trade. The virtual water content of a commodity is the volume of water used to produce that commodity (Hoekstra and Chapagain, 2008). Thus, a ton of pork bellies will be cast as more flow than a ton of oranges, since pork production is more water-intensive.
Note that data on transfers of commodities are starting to be analyzed using network-theoretic techniques, as a way to understand our world (Hausmann et al., 2014). Virtual water networks arising from food flow have previously been studied using network science analysis (Konar et al., 2011, 2012; Lin et al., 2014; Dang et al., 2015), but foodsheds have not been defined in that mathematical framework. Also note that there is no a priori reason to suspect foodsheds will be geographically contiguous. A given Persian Gulf metropolis might have food trade restricted to kumquats sourced from Greece, buckwheat sourced from New York, and honey sourced from Ethiopia, while exporting camel milk back to these three places. If foodsheds turn out to indeed be geographically contiguous like watersheds, this indicates the importance of physical geography in shaping virtual water flows.
Foodshed analysis studies factors influencing movement of food from its origin as agricultural commodities on a farm to its destination as food wherever it is consumed. It has been noted that “tools are needed to determine how the environmental impact and vulnerability of the food system are related to where food is produced in relation to where it is consumed. To this end, analyses of foodsheds…can provide useful and unique insights” (Peters et al., 2009).
Using our novel definition of foodsheds and data from Kampman (2007), we perform an initial foodshed analysis at the subnational level in India. We find foodsheds in India are indeed largely geographically compact, informing ongoing debates about local food systems and also larger questions on food system sustainability. Physical proximity between food producers and consumers, as in compact foodsheds, may reduce the energy needed to transport foods (Volpe et al., 2013), as well as associated negative externalities like greenhouse gas emissions (Pretty et al., 2005).3 Indian foodshed structure supports locality. This point is further demonstrated by noting the flow network can be embedded in the geographic connectivity network without much distortion. On the other hand, using 2007 Commodity Flow Survey data from the U.S. Census and virtual water commensuration that follows (Mekonnen and Hoekstra, 2011; Mubako, 2011; Dang et al., 2015), we find foodsheds that are not geographically contiguous. Looking at 2008 world virtual water flow data from Konar et al. (2011, 2012), some geographically local foodsheds do emerge.
The remainder of the paper is organized as follows. Sec. 2 discusses the definition and nature of virtual water flow. Sec. 3 introduces needed notions and definitions from spectral graph theory. Sec. 4 demonstrates our basic approach by finding virtual water flow-based foodsheds in India, the United States, and globally. Sec. 5 discusses our results, placed in the context of sustainable development, and also lists several avenues for future work.
2. Virtual Water Flow
Agriculture is by far the largest consumer of the world’s freshwater resources, accounting for 70% of total freshwater use (Koehler, 2008). Virtual water is the amount of water used to produce a particular product and can be used to quantify the water consumed in producing agricultural commodities (Hoekstra and Chapagain, 2008). In this paper, we use virtual water to commensurate food flow data and to identify the foodshed boundaries within a given geography. Commensurating food flow data into virtual water flows requires combining a variety of datasets from several different sources, and is an interesting challenge in and of itself, as we describe in the next paragraphs. There are other ways of commensurating food data, such as energy intensity, emissions intensity, commodity value, or commodity weight. We have chosen to focus on virtual water herein, as it may provide insights on the sustainability of current foodsheds with regards to their impact on fresh water resources.
Agricultural consumptive water use is considered to be equal to the soil and plant surface evaporation plus plant transpiration, collectively known as evapotranspiration. One might think that water that evaporates will fall in the same place, but standard analyses of the hydrological cycle treat evapotranspiration as losses (Dingman, 2002). Although water is also stored in plant tissues, the volume of water present in the tissue is negligible compared to the volume of water evapotranspirated and is typically ignored when calculating agricultural consumptive water use (Jensen, 1968).
We use here virtual water data developed by Kampman (2007), which is based on trade data from 77% of total crop production in India. In this case, virtual water is the sum of the total green water, blue water, and gray water consumed for crop production. Green water refers to a crop’s rainwater consumptive use, and blue water refers to a crop’s ground water and surface water consumptive use. Further, gray water is the amount of freshwater required to dilute polluted water generated from the crop’s growth activities to acceptable levels. Interstate crop import and export trade data are used to determine the incoming and outgoing virtual water flow between states, and this is the basis of the virtual water data we use to develop foodshed boundaries within India.
Virtual water data for the United States was derived using a simplified version of the methodology developed by Dang et al. (2015). Serving as the basis for the virtual water flow estimates are data on the movement of food commodities within the United States as obtained from the 2007 Commodity Flow Survey (CFS) (US Department of Transportation, 2007).4 The CFS is part of the Economic Census and is conducted every 5 years, where a sample of establishments based on geographic location and industry are selected and requested to report on shipments during the survey year. The CFS gathers information such as commodity code, description, value, weight, mode of transportation, and final destination. Based on the survey responses, estimates for the entire industry are made. The data are aggregated by state and CFS area; we consider the state level data aggregation in our analysis. The most recent year for which bilateral movement data are available from the CFS is 2007, hence it is the year that is used in our analysis. We start with the food flow values in weight for the five food commodity groups that are considered as staple food groups (Dang et al., 2015). We remove the fish fraction of the animal food commodity groups by determining the non-fish fraction of animal production in each state from the US Department of Agriculture (2007). We then use virtual water content factors developed from Mekonnen and Hoekstra (2011) and Mubako (2011), weighted by agricultural production by state as per the National Agricultural Statistics Service, to determine the virtual water content associated with each food commodity group. As a final step, the virtual water content for each food commodity group is added and represents the total blue and green virtual water flow between states for the year 2007.
The global virtual water flows, comprising total blue and green virtual water flows between nations for the year 2008, was provided by Konar et al. (2011, 2012); the virtual water flow derivation methodology is described therein.
3. Spectral Directed Graph Partition
Consider treating virtual water flow data as a directed network among distinct regions and denote the directed adjacency matrix of the virtual water flow network as A. In this section, we describe approaches for visualizing the network, and then determining foodsheds. We draw on a spectral approach for graph partitioning of a directed graph; partitioning of directed graphs is not nearly as well studied as of undirected graphs (Malliaros and Vazirgiannis, 2013), but the directionality of flow is critical here.
To visualize a virtual water flow network, we use spectral graph drawing (Koren, 2005), which is designed for undirected networks. As such, we first find the symmetrized adjacency matrix Au = A + AT. Let dj indicate the degree of vertex j in Au, and let D be the degree matrix of the graph Au, which takes value dj along the diagonal and value 0 otherwise. The Laplacian matrix of a graph, L, satisfies L = D − Au. The eigenvalues of L are denoted λ1(L) ≤ λ2(L) ≤ … ≤ λn(L). Since L is symmetric, all of its eigenvalues are real, and eigenvectors corresponding to different eigenvalues are orthogonal.
Then, we use the degree-normalized Laplacian eigenvectors corresponding to the second and third smallest eigenvalues as coordinates for plotting the network nodes.
Given our interest in understanding minimal distance for food transport, note that placing region nodes according to Laplacian eigenvector coordinates corresponding to the small eigenvalues minimizes the weighted quadratic cost in transporting goods (Hall, 1970; Varshney, 2013). This follows from developing the graph conductance (Cheeger constant) of the network and invoking the celebrated Courant–Fischer min–max theorems.
To be more explicit, consider the quadratic cost for a graph with adjacency matrix Au, with edge weights aij. Vertices are drawn in two-dimensional Euclidean space with horizontal placement and vertical placement . Then, cost W is
Using the graph Laplacian L, it is . Then, under some non-triviality constraints detailed in Varshney (2013), this cost is minimized by a placement such that is the eigenvector associated with λ2 and is the eigenvector associated with λ3 due to the Courant–Fischer theorems. The incurred cost is then λ2 + λ3.
To determine best partitions, we work with the directed graph itself, rather than symmetrizing into an undirected graph. We aim to find partitions that segregate flows within regions such that most flows remain within regions and there is little flow between regions; we desire cuts that separate regions to be at bottlenecks of flow. As shown by Chung (2005), appropriately defined graph Laplacians for directed graphs satisfy the Cheeger inequality, and therefore partitioning according to the Laplacian eigenvectors leads to the best segregation of flows within regions, i.e., this leads to a natural definition of foodsheds (recall that the Cheeger constant of a network measures the level of bottlenecks therein). As such, there is no need to consider other possible graph partitioning algorithms that do not yield natural groupings of geographical areas based on flow.
Let us restrict A to its largest strongly connected component, with adjacency matrix W, and construct a corresponding Markov transition matrix P:
It can be shown that there is a unique left eigenvector ϕ with ϕi > 0 for all i, which we normalize to have a unit eigenvector satisfying , which we call the Perron vector of P. We use the Perron vector to define a Perron matrix Φ, which is diagonal with ϕ on the diagonal.
Now we can define the Laplacian matrix of a directed graph as:
where ⋅* is the Hermitian transpose and I is the identity matrix of appropriate size.
Now we find the eigenvalues and eigenvectors of the Laplacian matrix, and since it is strongly connected, the smallest eigenvalue is 0. In analogy to spectral graph theory of undirected graphs, let us refer to the second smallest eigenvalue as the directed algebraic connectivity and the corresponding eigenvector as the directed Fiedler vector.
To partition vertices into two groups, we use the sign of entries in the directed Fiedler vector: those with positive sign are placed in one partition, whereas those with negative sign are placed in the other partition.
As is common in spectral partitioning, creating more than two groups involves iterative hierarchical bisection (Leicht and Newman, 2008). One criterion for stopping the bisection process is when the graph modularity no longer increases, i.e., the graph is already well partitioned. We generalize the modularity defined by Leicht and Newman (2008) to weighted directed graphs as follows.
Definition 1. The modularity of a weighted directed graph is:
where m is the total sum of all edge weights, is the total weight of incoming edges to node i, is the total weight of outgoing edges from node j, and indicates whether nodes i and j are in the same partition.
Definition 2. We define the natural partition of a geographic region into foodsheds as the hierarchical bisection of a food flow network according to directed Fiedler vectors until modularity stops improving.
Again, to emphasize the point, spectral partitioning approaches are natural for flow networks such as virtual water trade, since they restrict flow within regions as much as possible (Chung, 2005).
4. Empirical Foodsheds
We apply Def. 2 to the India virtual water flow data of Kampman (2007). There is virtual water flow data for 21 Indian states or union territories; regions with negligible agricultural trade are omitted. See Table 1, which also gives standard abbreviations and a listing of neighboring states or union territories. We find that although the virtual water flow network is weakly connected, it is not strongly connected as Delhi, Jharkhand, and Kerala only import virtual water but do not export. We depict all 21 states in our spectral visualization, Figure 1, but only define foodsheds comprising the 18 regions in the large strongly connected component.
Figure 1. Spectral layout of Indian states according to total virtual water flow, with directed spectral graph partitioning used to separate states into foodsheds. One foodshed is indicated as green, the other as orange, and nodes not part of the large strongly connected component are drawn in black.
Applying a first spectral bisection, we obtain the partition into two regions that are depicted in “flow space” in Figure 1 and depicted geographically in Figure 2. We find that modularity Q improves from 0 for the whole graph to 0.1656 after the first bisection. If we hierarchically do further bisection on either of the two regions as depicted in Figures 3 and 4, the modularity Q decreases to 0.0979 and to 0.0143, respectively. Thus, the natural partition of the virtual water flow network into foodsheds is just two foodsheds for India.
Figure 2. Geographic layout of two directed spectral foodsheds among the Indian states according to total virtual water flow. One foodshed is indicated as green, the other as orange, and states not part of the large strongly connected component or with missing data are drawn in white.
Figure 3. Spectral layout of Indian states according to total virtual water flow, with hierarchical directed spectral graph partitioning used to separate states into four regions. Regions are indicated as green, orange, red, and blue; nodes not part of the large strongly connected component are drawn in black.
Figure 4. Geographic layout of directed spectral regions among the Indian states according to total virtual water flow. Regions are indicated as green, orange, red, and blue; states not part of the large strongly connected component or with missing data are drawn in white.
Although there are formal approaches for defining geographic continuity (Wu and Murray, 2008), we clearly observe that the two foodsheds of India are largely contiguous. Only Karnataka is geographically separated from other states in its foodshed.
Using the dataset and virtual water flow quantitation described above, we study virtual water flow among the United States of America in 2007. We find that spectral partitioning stops at two foodsheds like India, but that these foodsheds are not largely geographically contiguous, see Figure 5. Note that we believe there are errors in the 2007 Commodity Flow Survey Data for Indiana, which is why Indiana is not in the strongly connected component.
Figure 5. Geographic layout of directed spectral regions among the United States according to total virtual water flow. Regions are indicated as red and blue; states not part of the large strongly connected component or with missing data are drawn in white.
Using 2008 world data as described above, we repeat the same spectral bipartition analysis and end up with three foodsheds: two are fairly small and compact, whereas the other contains all other countries in the world. Note that many countries of the world are not part of the strongly connected component of global commerce. The first foodshed is in southern Africa and also includes Papua New Guinea. The second is largely around southwestern Asia and the Indian Ocean rim. See Table 2. Note that India (Goswami and Nishad, 2015) and the United States are major net exporters of water-intensive food crops in global trade. Thus, in a global characterization of foodsheds, these countries play central roles and are in fact in distinct global foodsheds.
In this paper, we have put forth a formal data-driven definition of foodsheds from network science and computed foodsheds for India, the United States, and the world. We found that the Indian foodsheds are largely geographically contiguous, in the sense that states within foodsheds all border each other. Note that this notion of contiguity implicitly assumes the use of land transport, but ship-based transport may be more energy efficient (Van Passel, 2013). On the contrary, foodsheds in the United States are not nearly as contiguous, but some geographically compact foodsheds do emerge in the global flow data. However, in all three examples, the foodsheds are relatively large and indicate that both intranational and international food flows are highly connected between many states and nations.
Here, we defined foodsheds in terms of virtual water flow, but several alternative definitions could be considered, whether based on the tonnage of food itself, the total price of food, the virtual energy flow as embedded in food, or the negative externalities embedded in food. We expect that the results are fairly robust to the particular commensuration of flow used to define foodsheds. One can also further specialize data to consider, say, rice-sheds or potato-sheds. One may even be able to extend spectral graph techniques for directed networks to the setting of multilayer networks, to consider several notions of flow simultaneously.
It is our contention that the formal data-driven methodology we proposed for defining foodsheds can be used more broadly to study a variety of commodity flows and the impact these flows have on sustainability. Such information can be useful in developing environmentally oriented policies. Coming full circle, the effect of policy on foodshed boundaries can be visualized using the method we propose. That is, spectral graph partitioning of directed graphs may be a general analysis technique for sustainability science and public policy, cf., Hausmann et al. (2014).
Can we quantitatively determine which foodsheds are sustainable and which are not? We had noted that foodsheds are governed by the combined force of natural and social factors, and as we saw, we have contiguous (potentially sustainable) foodsheds if the natural geometry of the commodity flow network is essentially embeddable in the geographic adjacency network. That is, we may have sustainability, at least from the perspective of transport energy and externalities, if active human factors do not distort the natural way of things, as formalized in the sense of graph embedding.
Both authors contributed to all aspects of the paper.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- ^The above connections are not meant to be exhaustive, but rather illustrate some of the ways in which these resources are fundamentally linked.
- ^Not only can water be embedded in food but energy can also be embedded in water, as the ice trade of yore demonstrates (Cummings, 1949): there really is a nexus not just at the level of public policy, but in the commodities themselves.
- ^Food transport over land may, however, be more energetically and environmentally costly than over water. Thus, we should take care in interpreting results.
- ^The CFS is a collaborative effort between the Department of Transportation Statistics and the Census Bureau.
Konar, M., Dalin, C., Hanasaki, N., Rinaldo, A., and Rodriguez-Iturbe, I. (2012). Temporal dynamics of blue and green virtual water trade networks. Water Resour. Res. 48, W07509. doi:10.1029/2012WR011959
Konar, M., Dalin, C., Suweis, S., Hanasaki, N., Rinaldo, A., and Rodriguez-Iturbe, I. (2011). Water for food: the global virtual water trade network. Water Resour. Res. 47, W05520. doi:10.1029/2010WR010307
Pretty, J. N., Ball, A. S., Lang, T., and Morison, J. I. L. (2005). Farm costs and food miles: an assessment of the full cost of the UK weekly food basket. Food Policy 30, 1–19. doi:10.1016/j.foodpol.2005.02.001
Keywords: data, food, virtual water, water, spectral graph theory
Citation: Kshetry N and Varshney LR (2017) Foodsheds in Virtual Water Flow Networks: A Spectral Graph Theory Approach. Front. ICT 4:17. doi: 10.3389/fict.2017.00017
Received: 04 February 2017; Accepted: 09 June 2017;
Published: 26 June 2017
Edited by:Tom Crick, Cardiff Metropolitan University, United Kingdom
Reviewed by:Gokarna Sharma, Kent State University, United States
Jonathan Gillard, Cardiff University, United Kingdom
Copyright: © 2017 Kshetry and Varshney. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Lav R. Varshney, firstname.lastname@example.org