Network Models and Simulation Analytics for Multi-scale Dynamics of Biological Invasions

Globalization and climate change facilitate the spread and establishment of invasive species throughout the world via multiple pathways. These spread mechanisms can be effectively represented as diffusion processes on multi-scale, spatial networks. Such network-based modeling and simulation approaches are being increasingly applied in this domain. However, these works tend to be largely domain-specific, lacking any graph theoretic formalisms, and do not take advantage of more recent developments in network science. This work is aimed toward filling some of these gaps. We develop a generic multi-scale spatial network framework that is applicable to a wide range of models developed in the literature on biological invasions. A key question we address is the following: how do individual pathways and their combinations influence the rate and pattern of spread? The analytical complexity arises more from the multi-scale nature and complex functional components of the networks rather than from the sizes of the networks. We present theoretical bounds on the spectral radius and the diameter of multi-scale networks. These two structural graph parameters have established connections to diffusion processes. Specifically, we study how network properties, such as spectral radius and diameter are influenced by model parameters. Further, we analyze a multi-pathway diffusion model from the literature by conducting simulations on synthetic and real-world networks and then use regression tree analysis to identify the important network and diffusion model parameters that influence the dynamics.


Background and Motivation
Many natural-and engineered complex systems can be represented as systems of interconnected networks (Buldyrev et al., 2010;Barrett et al., 2013;Bashan et al., 2013). Such representations are very effective at capturing relationships of different types and at different scales between the entities comprising the complex system. There are several classes of such networks depending on the application, including multi-layer networks, multiplex networks, and network of networks, to name just a few. We refer the reader to Kivelä et al. (2014) for a comprehensive list. Here, we study modeling of complex networks arising in the context of multi-scale spatial dynamical processes, such as biological invasions, spread of infectious diseases, and built infrastructure (Balcan et al., 2009;Serrano et al., 2009;Walpole et al., 2013), our focus mainly being on the properties of dynamical processes over such networks.
In recent years, there has been a big thrust toward developing cyberinfrastructures to address problems related to resilient and sustainable agriculture (USDA-NSF, 2017Microsoft, 2020). One of the greatest threats to biodiversity and food security is the increasing rate of biological invasions (Pimentel et al., 2005;Crowl et al., 2008;Pyšek and Richardson, 2010). The annual economic costs arising from environmental damages and the losses caused by invasive species run into billions of dollars (Diagne et al., 2021). Understanding the role of natural processes and human activities in the spatio-temporal spread of biological invasions is of utmost importance (Cunniffe et al., 2015).
A variety of models for biological invasions have been proposed with the goal of studying these complex phenomena at appropriate levels of resolution and fidelity (Pitt et al., 2009;Carrasco et al., 2010;Smolik et al., 2010;Fitzpatrick et al., 2012;Robinet et al., 2012;Ferrari et al., 2014;Chapman et al., 2015;Douma et al., 2016;McNitt et al., 2019;Venkatramanan et al., 2020). In many of these works, biological invasions have been viewed as propagation processes over multi-scale networks. As depicted in Figure 1, there are various pathways or modes of dispersal. The spreading ability of the invasive organism, environmental factors (e.g., winds, ocean currents, and suitable environment), and anthropogenic factors (e.g., production and trade of host crops, and tourism) are among the primary pathways. These pathways affect the spread at multiple spatial and temporal scales. For example, the rate and pattern of self-mediated spread can be significantly different from that of human-assisted spread (Davis et al., 2001;Hoffmann and Courchamp, 2016). The functions or mechanisms that govern the process can vary within and across scales; the flying ability of a pest determines how far it can naturally spread, while trade of host crops and plant material can facilitate long distance spread. Accordingly, there are potentially several ways to control the phenomenon ranging from application of pesticides (farm-level) to trade restrictions (administrative-level).
While the models referenced above have been designed and analyzed from the perspective of biological invasions, there has been very little work on exploring the connection to wellknown formal frameworks developed in network science. A typical modeling effort uses networks to represent movements of species based on data or assumptions regarding the processes involved (or both). The resulting networks are analyzed using basic structural properties such as degree distribution and betweens centrality. Further, suitable diffusion models are used to simulate the spread over this network. It is important to understand these models from a network science perspective in order to (i) make robust predictions, (ii) rigorously address key aspects such as calibration, validation, verification, sensitivity analysis, and uncertainty quantification, and (iii) leverage stateof-the-art algorithms to address important problems pertaining to control, monitoring, and various inference problems in the field of network dynamical systems. The aim of this work is to FIGURE 1 | A multi-pathway network model of a spatial diffusion process in the context of biological invasions. The spread takes place at multiple spatial scales, such as (i) one cell to adjacent cells due to self-mediated spread, (ii) within a locality of high human activity, and (iii) long distance jumps facilitated by trade and travel.
provide the first steps in developing the foundations for these models. Through such formal grounding and abstractions one may also be able to take advantage of existing mathematical and computational theory, including computational paradigms facilitating efficient mapping of models onto high-performance software implementations that exploit modern hardware.
In this paper, we develop an abstract model for multi-pathway spatial processes. For this model, we characterize complex spatial diffusion processes using structural properties of the underlying network. Next, we provide an analysis of the roles of the different pathways on the rate and pattern of spread by applying a complex multi-pathway diffusion model proposed in McNitt et al. (2019), which we refer to as SPREAD (which stands for Simulation tools for Pest Risk analysis accounting for Ecological and Anthropogenic Drivers). Finally, we provide analyses of structural and dynamical properties of networks through the lens of machine learning algorithms to identify the primary drivers of spread. A more detailed description of our contributions is as follows.

A Model for the Multi-Pathway Spatial Network
We define a grid-based model called MPSN for synthetic spatial networks which captures the multi-scale nature of the spread process. The synthetic network is composed of nodes of a grid and metanodes called localities, each of which consists of a unique set of contiguous grid nodes. Its edge set is the union of the edge sets of three different graphs, each corresponding to a different pathway of spread as in Figure 1. The main purpose of proposing such a model is to bridge the gap between simple static network models like Erdős-Rényi or Chung-Lu graphs, and complex real-world networks that describe the interactions in the biological invasion process. These networks, while being significantly more complex than standard models, are still amenable to theoretical and experimental analyses as we demonstrate in this work.

Theoretical Results on Spectral Radius and Diameter
We derive bounds on the spectral radius and diameter of MPSN model. These two graph invariants are important indicators of the diffusion dynamics. The bounds highlight the importance of inter-locality and intra-locality components of the network, which represent the human factors in the spread.

Experimental Analysis of Synthetic and Real-World Networks
We report results from extensive experiments conducted on hundreds of synthetic networks using the MPSN model and several real-world networks. First, we study how the MPSN model parameters influence the spectral radius and diameter. This is followed by running simulations for varying pathway probabilities. In both analyses, to understand the influence of model parameters (i.e., parameters of the network and of the diffusion process) on the extent of spread, we view it as a supervised learning problem. Here the feature vector is comprised of model parameters and the observed variable is a graph invariant (in the structural analysis) or a simulation output (in the dynamical analysis), and we use regression trees and random forests to identify the primary drivers of diffusion process. Our experimental analyses help to identify the regimes where long distance edges and locality size are highly influential in increasing the spread.

Networked Representations of Invasive Species Spread
The current state-of-the-art for modeling invasive species involves developing risk maps using ecological niche models (Venette et al., 2010). Such models account for climate and biology of the invasive species and its hosts to map the long-term establishment potential. They do not provide a causal explanation to the extent and dispersion of spread or explicitly account for human-mediated pathways. However, in recent years, network diffusion models are being increasingly applied to model the spread dynamics of invasive species in order to account for their long distance spread. Hernandez Nopsa et al. (2015) studied the structure of rail networks for grain transport in the United States and Eastern Australia to identify the shortest paths for the anthropogenic dispersal of pests and mycotoxins, as well as the major sources, sinks, and bridges for movement. Sutrave et al. (2012) used an SI model (Easley and Kleinberg, 2010) to county-to-county network to model wind speed and direction, and host density to identify locations to monitor soybean rust, a pathogen. Koch et al. (2014) assess the risk of forest pests due to camping activities. Venkatramanan et al. (2020) used a Bayesian inference method to identify the most likely spread pattern of an invasive pest by modeling the spread as a diffusion process on a time varying network. Carrasco et al. (2010) considered both local and long distance spread in a process-based spatially explicit simulation model to study pest of the maize crop. They use phenology models to estimate pest population size, a negative power law kernel for self-mediated spread, and a gravity model representation of long distance edges. Our work is based on a similar approach by McNitt et al. (2019) who model the multi-pathway spread by accounting for self-mediated spread and spread within and between areas of high human activity (e.g., urban areas). Both works account for suitability of establishment and the distribution of host crop production. Similar modeling approaches have been applied to study infectious diseases in humans and livestock (Ajelli et al., 2010;Kim et al., 2018;Venkatramanan et al., 2021).

Spectral Characterization of Network Dynamics
Several structural properties of networks have been used to understand the progression of a diffusion process. These include basic properties such as degree distribution or clustering coefficient to other properties such as graph spectrum, diameter, and degeneracy, to name a few. The spectrum of a graph is the set of eigenvalues of its adjacency matrix. There are several works that relate spectrum, particularly the first eigenvalue or spectral radius λ 1 (G) of the adjacency matrix of a graph G, to disease spread in SEIR-like epidemic models (Ganesh et al., 2005;Prakash et al., 2012). A well known result that highlights the impact of the network structure on the dynamics is the following: an epidemic dies out "quickly" if λ 1 (G) ≤ T, where T is a threshold that depends on the disease model. This relationship has motivated a number of works on epidemic control where the objective is to find an optimal set of nodes (or edges) to remove from the network that leads to the maximum reduction in its spectral radius (Van Mieghem et al., 2011;Saha et al., 2015;Zhang et al., 2016;Chen et al., 2021).

Diameter and Network Dynamics
The diameter of real-world networks is an important structural parameter used to characterize epidemics on real-world graphs (Holme, 2013;Pastor-Satorras et al., 2015). In the literature, average path lengths between pairs of nodes and diameter of the network have been observed to have an effect on the rate of diffusion. In particular, the lower the diameter, the higher is the diffusion rate (Banos et al., 2015;Taghvaei et al., 2020;Kamra et al., 2021). In spatial networks, the diameter tends to be large when compared to social networks that exhibit the small-world effect (Watts and Strogatz, 1998), and long distance edges can be responsible for bringing the diameter down (Barthélemy, 2011).

Machine Learning and Simulation Systems
From the works described above, it is clear that even simple SIR-like processes (Easley and Kleinberg, 2010) on arbitrary static networks are difficult to characterize. In recent years, there have been several studies that use a combination of extensive simulations and machine learning algorithms to understand the phase space of complex models. Fox et al. (2019) explored multiple ways in which such a nexus of the two modeling approaches can be used to understand complex systems. Lamperti et al. (2018) used extreme gradient boosted trees for the purpose of phase space exploration of a complex model. Our approach to apply classification and regression trees (CART) and random forests (Breiman, 2001(Breiman, , 2017 is motivated by this work. Other methods based on Gaussian emulators have been used for calibrating agent-based models (Fadikar et al., 2018).

Graph Theoretic Concepts
We begin with the definitions of several basic graph theoretic concepts. Additional information regarding these concepts can be found in West (2006); Brouwer and Haemers (2012). We assume that graphs are simple (i.e., they have no multi-edges or self loops).
Given an undirected graph G(V, E), where V = {v 1 , v 2 , . . . , v n }, the adjacency matrix A G is an n × n (symmetric) matrix defined as follows.
if v i and v j are adjacent and 0 otherwise.
We use degree(v i ) to denote the degree of node v i . Also, we use δ(G) and (G), respectively, to denote the minimum and maximum node degrees in G. Consider a connected undirected graph G(V, E) with a nonnegative weight w(e) on each edge e ∈ E. When w(e) = 1 for each e ∈ E, we sometimes refer to G as an unweighted graph. A shortest path between any pair of nodes v i and v j is a path such that the total weight of all the edges in the path is a minimum among all the paths between v i and v j . Let d G (v i , v j ) denote the length of a shortest path between v i and v j . Then, the diameter of G is defined by If G is not connected, then by convention, diam(G) is taken as ∞.
Given an undirected graph G(V, E) and an integer t ≥ 1, the t-th power of G, denoted by G t , is an undirected graph G t (V, E t ), where {v i , v j } ∈ E t iff there is a path with at most t edges between v i and v j in G. Given two graphs G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ), the Kronecker product of G 1 and G 2 , denoted by

Matrix Concepts
For any n × n symmetric matrix M with real entries, it is known that all the n eigen values, denoted by λ 1 , λ 2 , . . ., λ n , are real (Brouwer and Haemers, 2012). Since the n×n adjacency matrix of an undirected graph with n nodes is symmetric and all its entries are from {0,1}, it follows that all the eigen values of the adjacency matrix are real. We will assume without loss of generality that these n eigen values are ordered so that λ 1 ≥ λ 2 ≥ . . . ≥ λ n . Thus, λ 1 will be referred to as the first eigen value or the spectral radius.

Geometric Concepts
For any pair of points a = a x , a y and b Euclidean distance between a and b. It is well known that the Euclidean distance satisfies the triangle inequality: for any three points a, b, and c, d Euc (a, b) + d Euc (b, c) ≥ d Euc (a, c). As a generalization of this inequality, we have the following fact: if a 1 , a 2 , . . ., a n are n ≥ 3 points in the plane, then n−1 i=1 d Euc (a i , a i+1 ) ≥ d Euc (a 1 , a n ). We will use this inequality in Section 6.

Motivation
A number of models have been proposed for spatial networks, where nodes and edges are embedded in a d-dimensional space with some geometric constraints. Such a network model acts as a reference model for comparison with real-world networks, and is amenable to theoretical analysis and controlled experimentation. Network models are also a principled approach to approximate partially-known real-world networks and to capture their variability (Gutfraind et al., 2015). Barthélemy (2011) provides a comprehensive review of such models. Lattice graphs and random geometric graphs are examples of some of the simplest spatial networks. More complex models that exhibit properties found in real-world spatial networks have also been proposed. Most of these models have been derived by spatial generalizations of well-known random graph models, such as Erdős-Rényi graphs, the Watts-Strogatz model, and the preferential attachment model Easley and Kleinberg (2010). However, to the best of our knowledge, none of these models is representative of networks arising out of biological spread processes like epidemics of invasive species or infectious diseases. While these network models have a base component that is gridor lattice-like, they are missing a multi-scale component due to human-mediated pathways of spread. At the same time, the domain specific studies are limited to simulations on a single instance of the network that is usually constructed using a combination of available data and spatial interaction models such as the gravity model. Uncertainty quantification and sensitivity analyses are typically limited to diffusion model parameters, not network structure. Here, we address these gaps by developing a model that is a realistic representation of such processes, and analyzing its structural parameters such as spectral radius and diameter that are well-studied in the context of diffusion processes on networks. Later, in Sections 5 and 6, we will further discuss how the network model parameters affect these structural properties.

General Structure
At a high level, some aspects of this spatial network are captured in Figure 1. Here, we will describe the graph class G(V, E) of multi-pathway networks where V is a set of n vertices embedded in some suitable, 2-dimensional space M (e.g., Euclidean space R 2 , the sphere S 2 ). The graph G = G(V, E) is composed of three graphs G S , G L , and G LD , where the subscripts S, L, and LD denote corresponding pathways. With respect to Figure 1, G S corresponds to the self-mediated spread, G L is the spread within a locality, and G LD is the inter-locality spread or long distance jumps. In general, G will be a loop-free, edge-labeled, and directed multi-graph. Edges are labeled by the pathway to which they belong with (u, v, π) encoding the edge from u to v due to pathway π. In the case where G is undirected we will use the notation ({u, v, }, π) for undirected edges, or simply {u, v} when the pathway is clear from the context. Let r denote the dispersal range. The graph G S has vertex set V and an edge between each pair of vertices The graph G LD has vertex set V and its edge set is the union of the sets E f (ℓ, ℓ ′ ) over all edges (ℓ, ℓ ′ ) of F LD .

A Multi-Pathway Spatial Network Model
Here, we define the graph model class called multi-pathway spatial network (MPSN). Its vertex set consists of points on a √ n × √ n square lattice with integer coordinates (i, j), where i, j ∈ {0, 1, 2, . . . , √ n − 1}. We will assume throughout that n is a perfect square. Let k be a positive integer satisfying the following conditions: (i) it is a perfect square and (ii) √ k is a factor of √ n. The vertex set is partitioned into k regions, each inducing a n/k × n/k subgrid. Each region consists of one locality with s vertices, where s has the same parity (odd or even) as n/k. Each locality is again a square grid of size √ s × √ s at the center of that region. For a locality v, let L v denote the set of points on its square grid. Let L denote the set of localities.
As described above, we have three component graphs G S , G L , and G LD . The edge set of G S is determined by the range parameter r. Any two lattice points are adjacent in G S if they are at a Euclidean distance of at most r. The edge set of G L is determined by a template graph H L , referred to as the intra-locality graph, on a √ s × √ s grid. For each locality v, let G v denote the graph on the vertex set L v . Consider the bijection between the vertex sets of G v and H L induced by the natural ordering of their vertices on the √ s × √ s grid. Each G v is isomorphic to H L under this bijection. The graph G L is the union of all graphs G v . Let F LD be a graph defined on the localities, referred to as the inter-locality graph. The long distance human-mediated pathway graph G LD is defined as follows: We will henceforth denote this model by MPSN(n, r, k, s, H L , F LD ). In the following discussion, we will consider scenarios where F LD is sampled from some random graph model denoted by G. Examples of such models are Erdős-Rényi model and the preferential attachment model (Easley and Kleinberg, 2010). In such cases, F LD is replaced by G, and the resulting notation is MPSN(n, r, k, s, H L , G).

A Note on Directed Graphs
For simplicity, we only consider the undirected version of the multi-pathway network model. However, in real-world applications, these are typically directed weighted graphs that do not exhibit symmetry with respect to edge weights [see for example the networks in McNitt et al. (2019) or Carrasco et al. (2010)]. A natural directed version of the MPSN would correspond to G S and G L graphs with every undirected edge replaced by a pair of bidirectional edges and F LD being a directed graph. Note that the resulting graph will be strongly connected.

SPECTRAL CHARACTERIZATION
Here, we provide bounds for the spectral radius of an MPSN instance in terms of the spectral radii of the component networks representing the different pathways. The main objective is to understand the importance of each pathway in a diffusion process through the lens of graph spectra. Our main result is Theorem 5.1, where lower and upper bounds on the spectral radius of the multi-pathway network are expressed in terms of the parameters of the MPSN model.
Theorem 5.1. Let G = MPSN(n, r, k, s, H L , F LD ) be a multipathway spatial network. The spectral radius of G can be bounded as follows: The first term of both bounds in the above result corresponds to self-mediated spread (G S ). We note that this is fully determined by the range parameter r and not by the size of the graph. Also, it increases as the square of the range. From an invasive species perspective, this suggests that a species with strong flying capability can rapidly expand its range. An example of such a recent global invasion is that of the fall armyworm (Westbrook et al., 2016;Day et al., 2017), which is known for its long distance migration. The second term in the bounds corresponds to intra-locality spread. This depends on the size and structure of the localities. A more significant contribution comes from the third term corresponding to the inter-locality spread pathway. It is proportional to the spectral radius of the inter-locality graph F LD . The denser the graph, the greater its contribution. More importantly, it is amplified by (s−1), the size of the locality. The main implication of this is that the higher the number of areas are that are suitable for establishment of the pest in a locality, the greater the influence of the inter-locality pathway. The rest of the section is devoted to a proof of the above theorem. Throughout these proofs, we will denote the edge set of a graph G by E(G).
Lemma 5.2. The spectral radius of the short-distance pathway network G S with n nodes and range r is (r 2 ).
Proof: Note that G S is a network induced on a set of points located on an √ n × √ n grid with integer coordinates and range parameter r. We use the notation u = i, j for each vertex, where i, j ∈ {0, . . . , √ n − 1} are the coordinates. Let H 1 denote the grid graph where each i, j is adjacent to i ± 1, j and i, j ± 1 . Let H 2 denote the graph where each i, j is adjacent to i ± 1, j , i, j ± 1 , and i ± 1, j ± 1 . In the definitions of both H 1 and H 2 , edges to vertices whose coordinates do not exist within the √ n × √ n grid are not added.
To this end, it is enough to prove this for u = 0, 0 and v = x, y since both distances are shift invariant. For this case, d Euc (u, v) 2 = x 2 + y 2 . Since u and v are adjacent in H ⌊r⌋ 1 , v is reachable from u in at most ⌊r⌋ hops in H 1 . This means that x + y ≤ r, with x and y being positive. Therefore, x 2 + y 2 ≤ x 2 + (r − x) 2 ≤ r 2 + 2x(x − r) ≤ r 2 , and E(H ⌊r⌋ 1 ) ⊆ E(G S ). Now, we will show that if x 2 + y 2 ≤ r 2 , then (u, v) ∈ E(H ⌊r⌋ 2 ). Since x 2 + y 2 ≤ r 2 and x and y are integers, we have x, y ≤ ⌊r⌋. Note that in H 2 , d H 2 ( 0, 0 , x, y ) = max(x, y) ≤ ⌊r⌋. This is because, assuming without loss of generality, that x ≤ y, x, y is reachable from 0, 0 by the path 0, 0 1, 1 · · · x, x x, x + 1 · · · x, y , hence E(G S ) ⊆ E(H ⌊r⌋ 2 ), completing our proof of Claim 1. We now recall a result from spectral graph theory (Brouwer and Haemers, 2012).
Proof of Claim 2: To prove Part (i), note that in H ⌊r⌋ 1 , the nodes with minimum degree are the corner vertices of the grid. Recalling that 0, 0 is a corner vertex, and any neighbor x, y satisfies x + y ≤ ⌊r⌋, its degree is ⌊r⌋ (⌊r⌋ + 1)/2.
To prove Part (ii), note that in H ⌊r⌋ 2 , if max(x, y) ≤ ⌊r⌋, then x, y is a neighbor of 0, 0 . There are ⌊r⌋ 2 − 1 such vertices. For sufficiently large n in comparison to r, a non-corner vertex can have a degree of up to 4(⌊r⌋ − 1) 2 + 4(⌊r⌋). This completes our proof of Claim 2.
Lemma 5.4. Consider the local human-mediated spread pathway network G L with n nodes, number of regions k and locality size s. Let H L be the intra-locality graph. The set of eigenvalues of the adjacency matrix G L is identical to the set of eigenvalues of the adjacency matrix of H L .
Proof: By design, G L induces multiple connected components, with each component being isomorphic to H L . The result follows.
Lemma 5.5. Consider the local human-mediated spread pathway network G LD with n nodes, number of regions k and locality size s. Let F LD be the inter-locality graph. The spectral radius of G LD equals (s − 1)λ 1 (F LD ).
Proof: We will use the definition of localities from Section 4.3 and the definition of Kronecker product of graphs from Section 3.
By definition, (a, b) ∈ E(G LD ) ⇐⇒ a ∈ L u , b ∈ L v and (u, v) ∈ E(F LD ). Also, each L u is of the same size s. Let G K denote a graph on s vertices which induces a complete graph. Now we show that the graph G LD is equivalent to the graph H obtained as the Kronecker product of F LD and G K . Since G K has s nodes, we can define a bijection between V(G K ) and V(L) for any locality L. Let this bijection be denoted by π u : V(G K ) → L u . Let (u, x), (v, y) ∈ V(H) where u, v ∈ V(F LD ) and x, y ∈ V(G K ).
Suppose (u, x) and (v, y) are two vertices in H, where u = v, a = π u (x) and b = π v (y). We will now show that {(u, x), (v, y)} is an edge in H iff a is adjacent to b in G LD . Suppose {(u, x), (v, y)} is an edge in H. Since (u, x) is adjacent to (v, y), we have that u is adjacent to v in F LD and x is adjacent to y in G K . Note that since G K is a complete graph, the latter condition will always be true. Since a belongs to locality u and b belongs to locality v, it follows by the definition of G LD that a and b are adjacent as well. Now suppose {a, b} ∈ G LD . Since the two vertices belong to different localities, they can be adjacent only because u is adjacent to v in F LD . Again noting that G K is a clique, x = π −1 u and y = π −1 v are adjacent in G K . Therefore, {(u, x), (v, y)} is an edge in H.
From Lemmata 5.2, 5.4, and 5.5, the proof of Theorem 5.1 follows by (i) noting that the spectral radius of any graph is at least as large as the spectral radius of any of its subgraphs, and (ii) for any two Hermitian matrices A and B (such as the adjacency matrices of graphs), λ 1 (A + B) ≤ λ 1 (A) + λ 1 (B).

DIAMETER
In this section, we derive an upper bound on the diameter of an MPSN instance, where the inter-locality graph is drawn from an Erdős-Rényi model. In particular, we show that the addition of a few long distance edges corresponding to the inter-locality graph can lead to a significant decrease in the diameter. Our main result is as follows.
Theorem 6.1. Let G ∈ MPSN(n, r, k, s, H L , G) be an instance of the multi-pathway network model where F LD ∈ G is an instance of the Erdős-Rényi random graph model G = H(k, ǫ/k) for some ǫ > 0. (i) The diameter of the graph G S (i.e., G without the interand intra-locality edges) is ( √ n/r). (ii) The diameter of G is asymptotically almost surely O n k log k .
This theorem implies that, for a sufficiently large k (number of localities), even when the range r is low, a small number of long distance edges can reduce the diameter by a significant amount. In practice, this reduction in diameter is significant since the value of r is small while k is a much larger integer. From an application perspective, this suggests that even organisms with limited flying ability can cover great distances in a few hops due to long distance trade or travel links. To prove Theorem 6.1, we use the following result on the diameter of randomly perturbed graphs due to Krivelevich, Reichman and Samotij (Krivelevich et al., 2015). Their main observation is that if for some small ǫ > 0, approximately ǫn random edges are added to any n-node graph, then asymptotically almost surely, the diameter of the resulting graph is O(log n). Formally, their result is stated as follows. (Krivelevich et al., 2015)). For every ǫ > 0, there exists C > 0 such that the following holds. Let G be an n-vertex connected graph, choose R ∼ G(n, ǫ n ) and let G * = G ∪ R. Then, asymptotically almost surely, the diameter of G * is at most C log n.

Lemma 6.2 (Krivelevich, Reichman and Samotij
Proof of Theorem 6.1: Part (i): First, we will prove the lower bound on the diameter of G S . On the grid over which G S is defined, consider the two nodes x = 0, 0 and y = √ n − 1, √ n − 1 . The Euclidean distance between these two nodes is d Euc (x, y) = 2(n − 2 √ n + 2). It can be verified that for n ≥ 9, d Euc (x, y) ≥ √ n. Suppose the length of a shortest path P between x and y in G S has ξ edges. By the definition of G S , each of these edges has a geometric length of at most r. Therefore, the total geometric length of all these edges in P is at most rξ . Thus, by the generalized triangle inequality (Section 3), the distance between x and y is at most rξ . As noted above, for n ≥ 9, this distance is at least √ n. Thus, rξ ≥ √ n or ξ ≥ √ n r = ( √ n/r). In other words, there is at least one pair of nodes in G S for which the shortest path uses ( √ n/r) edges. Therefore, the diameter of G is ( √ n/r), and this completes our proof of Part (i).

Part (ii):
Recall that L is the set of localities. By definition, there are k localities placed on a √ k × √ k grid (see Figure 2). We will first create a base graph F B on L by adding edges between neighbors on the grid. More formally, we will assume that the coordinates of each locality x, y satisfy x, y ∈ {0, 1, 2, . . . , √ k − 1}. In F B , a locality with coordinates x, y is adjacent to x ± 1, y ± 1 (if nodes with those coordinates are in the same locality). Noting that F B is connected and by assumption, F LD ∈ H(k, ǫ/k), from Lemma 6.2, it follows that the diameter of the graph F LD ∪ F B obtained by adding the edges of F LD to F B is asymptotically almost surely at most c log k for some positive constant c. Now, we will prove the upper bound on the diameter of G. Let u, v ∈ V be two distinct nodes on the grid. Let L u and L v be the localities closest to u and v respectively. The distance from u to the closest vertex u ′ ∈ L u is at most most 2( n/k − s)/ r √ 2 . The same holds for v. As proven above, there exists a path P of length at most c log k almost surely from u to v in F LD ∪ F B . Let P = u − u 1 − u 2 − · · · − u t − v. Some of the edges in this path belong to F B and the rest to F LD . If u i u i+1 belongs to F LD , then by definition of G LD , there exist corresponding nodes u i ∈ L u i and u i+1 ∈ L u i+1 such that u 1 and u 2 are adjacent in G LD and, therefore, are adjacent in G. If on the other hand, u i and u i+1 are adjacent in F B , then, any two vertices u i ∈ L u i u i+1 ∈ L u i+1 are at a distance at most 2( n/k − s)/ r √ 2 + 2 diam(G L ). The first term, which corresponds to inter-locality distance on the grid, is clearly O( n/k). The second term corresponds to node-tonode distance within a locality. Note that diam( Noting that s ≤ √ k and k ≤ √ n, it can be verified that diam(G L ) = O( n/k). Thus, the length of each edge in P is O( n/k) and the number of edges in P is asymptotically almost surely at most c log k. It follows that the length of path P from u to v is asymptotically almost surely O n k log k . This completes our proof of Part (ii).

A MULTI-PATHWAY DIFFUSION MODEL
We now describe briefly the model that is based on the approach of McNitt et al. (2019), referred to as the SPREAD model. We refer to Figure 1 and Section 4 for the network representation on which the diffusion process is based. There are three pathways of spread. Self-mediated dispersal corresponds to diffusion from one node to its neighboring nodes, intra-locality dispersal is diffusion within a locality (farmer-market interactions), and inter-locality diffusion corresponds to long distance dispersal from one cell to another (trade). The diffusion model is a discretetime SEI process where a node transitions from E to I after ℓ time steps. The transition from S to E is described below.
Each node v has a periodic time-varying attribute, namely its infectivity ρ(v, t). The probability that a node can be infected through a pathway is modeled as a negative exponential function of infectivity and pathway parameters, which can be expressed as edge weights between two nodes as follows. For the shortdistance dispersal, the probability that node v is infected by any "neighbor" v ′ that is at a distance at most r (range) is given by where P s is the edge label corresponding to the pathway and α s is a tunable pathway parameter. For two nodes v and v ′ within a locality, the probability of within intra-locality transmission from v ′ to v is given by where P ℓ is the pathway label and α ℓ is the tunable pathway parameter. For the inter-locality transmission, the weights on the flow network F LD influence the probability of transmission.
Suppose v ∈ L v and v ′ ∈ L v ′ ; then the probability that v is infected by v ′ through this pathway is given by where P ℓd is the pathway label and α ℓd is the tunable pathway parameter. As mentioned earlier, the dynamics of the SPREAD model follow the Susceptible-Exposed-Infectious (SEI) (Marathe and Vullikanti, 2013), as defined below. Each node is in one of the following states: susceptible (S), exposed (E), or infectious (I). Let S ⊆ V denote the initial set of nodes in state I. The nodes in S serve as the seeds for the diffusion process. At any time t, each node u in state I infects its susceptible neighbor v with probability equal to the weight w(u, v, P, t) via the labeled edge (u, v, P). An exposed node (i.e., a node in state E) transitions to the infectious state after ℓ time steps, where ℓ is the latency period. It corresponds to the time taken for the node to transition from being exposed to being infectious.

Outline
The extent and pattern of spread in a network depend on both the network structure and the diffusion model properties. Our goal here is to identify the drivers of the diffusion process. We will study how combinations of these properties affect the spread in the network. Specific studies are as follows.
1. We study how MPSN model parameters such as range, number of localities and the density of the intra-and interlocality graphs affect the structure of the network from the perspective of dynamics. Specifically, we analyze the evolution of the spectral radius (λ 1 (G)) and diameter (diam(G)) with respect to model parameters. 2. We analyze the networks considered above with respect to the SPREAD diffusion model. Here, the objective is to study how combinations of the pathway probabilities determined by α s , α ℓ , and α ℓd and network model parameters affect the spread. We also study how combinations of structural and dynamical properties influence the spread. Further, we investigate how representative network parameters (such as spectral radius and diameter) are in characterizing SEIlike diffusion processes especially when non-uniform edge probabilities are involved. 3. Based on the insights derived from the above two studies, we analyze the properties of several real-world commodity flow networks (McNitt et al., 2019). These networks are temporal and edge-weighted unlike MPSN considered in the two studies Inter-locality graph F LD G(n, p) ǫ: (inter-locality graph edge probability ǫ/k) 0.1, 0.5, 1, 5, 10 Each of these networks is constructed using various datasets on production, trade, and consumption of tomatoes (McNitt et al., 2019). These networks are edge-weighted. above. We analyze the structural properties of the temporal snapshots of these networks.

Networks and Experiment Design
All the code used to generate the results is made publicly available (Adiga, 2021). The parameters used to generate synthetic networks corresponding to the MPSN model and the characteristics of real-world datasets corresponding to domestic trade of tomatoes are provided in Tables 1, 2, respectively. For the synthetic networks, we considered a 64 × 64 grid of nodes. Using a full factorial design, networks were constructed for combinations of parameters in Table 1. For the inter-locality graph F LD , we used the Erdős-Rényi random graph model where the edge probabilities were chosen to be functions of the number of nodes of F LD , which in turn is equal to the number of regions k. However, not all combinations of parameters are valid. Since these networks have a random graph component, for each valid combination of parameters, we constructed 10 replicates. Thus, over 7,000 networks were constructed. We note that while the synthetic networks considered here are undirected and unweighted, the real-world networks considered ( Table 2) are directed and edge-weighted. In addition, these temporal networks are also periodic: there are 12 snapshots of networks, one for every month of the year induced by the seasonal production and flow of commodity around the year. The details of network construction are given in McNitt et al. (2019). Structural properties were computed for all the synthetic networks. A subset of these networks was used for dynamical analysis using the diffusion model presented in Section 7. For this, a range of parameter values corresponding to the SPREAD diffusion model of Section 7 were used. This is listed in Table 3. The number of simulation instances in each case was 100. Since diffusion is modeled as an SEI process, assuming that the network is strongly connected, the fixed points or the equilibrium points of the system are either all nodes being in the infected state or all nodes being in the susceptible state. Therefore, we are interested in the state of the system at specific time steps. The observed variable is the mean number of nodes infected by a given time horizon or by time step T. For the initial seeding, five percent of the nodes were chosen uniformly at random and set to the infected state.

Structural Analysis of MPSN Model
The decision tree analysis of spectral radius λ 1 (G) in Figures 3, 4 indicates that the primary drivers of its value are locality size s and the inter-locality edge probability factor ǫ. The significance of the former can be attributed to the model. We recall that the dominating term in the bounds provided in Theorem 5.1 is λ 1 (G LD ) = λ 1 (F LD )(s − 1) consisting of the locality size and ǫ that influences λ 1 (F LD ). For Erdős-Rényi graphs, the spectral radius asymptotically tends to the maximum of ǫ and the square root of the maximum degree of the graph (Krivelevich and Sudakov, 2003). We note that even though range r contributes to the value of λ 1 (G), its influence is quite small compared to the other parameters. Here, we recall that locality size s, λ 1 (H L ) and λ 1 (F LD ) are factors related to human-mediated dispersal. Under the assumption that the spectral radius is an indicator of rate of dispersal, we can see the significance of these components with respect to the diffusion process. We also note that for larger values of locality size and ǫ, the number of regions k contributes to the value of the spectral radius as the maximum degree of F LD increases with k.
While diameter, like spectral radius, is closely related to dynamics, we observe in Figures 3, 4 that the most significant influence on its value is due to the range r; this is closely followed by the number of regions k and ǫ. We note that doubling the range reduces the diameter by approximately half. Greater the number of regions and ǫ, the greater is the number of long distance edges. Hence, the diameter goes down. We note that unlike the case of spectral radius, locality size is not an important factor with respect to diameter.

Dynamical Analysis of MPSN Model
These experiments correspond to diffusion using the SPREAD model on different MPSN instances and different combinations of pathway parameters α s , α ℓ , and α ℓd . The observed variable is the mean number of infections that have occurred by a given time T. The results of our experiments are in Figures 5, 6, where the progressions of the diffusion process under different conditions are plotted. A general observation is that, when the number of infected nodes is much greater than √ n, the rate of FIGURE 5 | Unraveling pathways: Here, we choose a positive value 0 ≤ c < 1. All pathway parameters α s , α ℓ , and α ℓd are either set to 0 or c. In the experiments, we incrementally unravel the pathways starting the edges from the grid (α s = c, α ℓ = 0, α ℓd = 0), followed by intra-locality edges (α s = c, α ℓ = c, α ℓd = 0), and finally, inter-locality edges (α s = c, α ℓ = c, α ℓd = c). For all the cases where α ℓd = 0, the variance is higher compared to the remaining cases as these also account for 10 replicates of the networks.
spread is linear. This is because, for any infected vertex, most or all of its neighbors are infected. In each time step, at the forefront of the spread process, at most a constant times √ n vertices are available to be infected in the next time step.

Progression of the Diffusion Process
Our first set of experiments concern the importance of pathway parameters given a network. Here, we fix a transmission probability on each edge. Using the pathway parameters, we first activate only the edges of G S . This is the baseline process of diffusion over just the grid. From the domain perspective, this corresponds to natural or self-mediated spread alone. Then, we unravel the edges corresponding to G L , followed by the edges of G LD . In Figure 5, the top left plot is the reference plot corresponding to MPSN with s = 16, ǫ = 1, and r = 2. We see that when only α s > 0 or in other words, only the grid edges are activated, the spread is slow in the beginning followed by an increase in the rate before it saturates. A similar phenomenon is observed even when the intra-locality and interlocality edges are activated. We note that the rate of increase is higher when the inter-locality edges are introduced; this highlights the importance of long distance edges in increasing the rate of spread. We compared the reference plot with diffusion on networks where ǫ = 0.1 (top middle in Figure 5) and ǫ = FIGURE 6 | Non-uniform pathway probabilities: In each figure, there are three sets of plots. In each set, two pathway parameters are fixed while the remaining pathway parameter value is varied from 0 to 0.01.
10.0 (top right). For ǫ = 0.1, the inter-locality edges do not contribute to the infections, while for ǫ = 10, this pathway is dominant. In addition, the progression of the process is very fast in the beginning and saturates quickly, after which the increase is almost linear. This indicates that the long distance edges contribute to fast short-term spread, an important factor to account for in preparing for interventions.

Importance of Locality Size and Range
The bottom left plot in Figure 5 corresponds to a smaller locality size. We see that locality size significantly affects both intraand inter-locality spread. Reduction in locality size to 4 makes these pathways insignificant. With increase in range (bottom right), the diffusion through short distance pathway is rapid, and, therefore, the process reaches saturation before either the intraor inter-locality pathways can make a difference.

Non-Uniform Probabilities
In Figure 6, we fix two pathway parameters while the third parameter is varied. We observe that the highest increase in the rate of spread corresponds to the short-distance parameter α s and the least is α ℓ , the intra-locality parameter. We also note that larger the ǫ, the greater the number of long distance edges and, therefore, the higher the increase in the rate of spread as α ℓd is increased (see top row of Figure 6).

Regression-Tree Analysis
We applied regression tree and random forest algorithms on the simulation data. The first objective was to find the network and model parameters that drive the infection. The second objective is to determine how structural parameters such as spectral radius and diameter influence the spread. Accordingly, we have two sets of results in Figures 7, 8, respectively. The regressiontree analysis succinctly captures some aspects of the results in Figures 5, 6. As observed in the earlier plots, the rate of spread is determined primarily by α s and range r. For smaller time steps, the inter-locality spread is significant, particularly when α s and r are small (Figure 7). Therefore, ǫ and locality size s are significant predictors of spread. However, in the later time steps, due to saturation, only the parameters that affect short-distance spread are good predictors of spread.

Spectral Radius, Diameter, and Number of Infections
In Figure 8, we note that spectral radius features only in the first plot. This is again because spectral radius is determined primarily by inter-locality parameters. However, for later time steps, only diameter and α s determine the mean number of infections. Again, this just reflects the fact that diameter and range are correlated. Therefore, spectral radius is a good predictor of spread during the initial phase (either first few time steps or when the number of infections is small).

Robustness of Regression Tree Analysis
We note that the outputs of the random forest algorithm are influenced by hyper parameters, such as number of estimators (or trees), maximum depth, minimum number of samples required to split an internal node, etc. In Figure 9, we have plotted the parameter importance based on increase in node purity (Gini index) and increase in mean square error for node size and number of trees. We note that the results are consistent.

Analysis of Real-World Commodity Flow Networks
For the real-world networks, we computed the spectral radius and diameter for the following graph. We first decomposed the network into 12 components, one for each month of the year. In the SPREAD model of Section 7, we note that in Equations (1)-(3) the probability of transmission is dependent on infectivity ρ(v, t). For the short-distance graphs G S and G L , every outgoing edge was assigned a weight of ρ(u, t), where u is the source node. For G LD , every outgoing edge was assigned the weight of ρ(u, t)F uv , where u is the locality to which the source u belongs and F uv is the edge weight on the edge {u, v} in the inter-locality graph F LD . The (weighted) adjacency matrices of these graphs were added together and the spectral radius of the resulting matrix is computed. Note that the spectral radius is a real number as the matrix corresponds to a strongly connected directed graph. This follows from the Perron-Frobenius theorem (Brouwer and Haemers, 2012). We call this the weighted spectral radius. We also computed the spectral radius of the graph without the weights (i.e., the weight is 1 iff the ijth entry of the adjacency matrix is non-zero). This is referred to as the unweighted spectral radius. The diameter was computed for the unweighted graph.
The results are in Figure 10 where the three structural properties are plotted for different months and two range values, namely 1 and 2. We observe that the weighted spectral radius is very high for BD compared to other networks. In all networks, we see variation in the spectral radius. This is due to seasonal fluctuations in the production (which affects ρ) and, therefore, in trade (which affects the long distance edges). The PH network shows highest variation in the case of unweighted spectral radius. This parameter is sensitive to the presence or absence of edges.
are several additional directions for future research. First, one can investigate the role of other structural parameters of the graphs generated by multi-pathway model (e.g., clustering coefficient, k-core sizes, closeness centrality (Easley and Kleinberg, 2010)) in determining properties of the spread process. A second direction is to investigate the suitability of other epidemic models proposed in the literature (Marathe and Vullikanti, 2013) for biological invasions. A third direction is to incorporate more realistic models like the gravity model and scale-free graphs for inter-locality spread. Finally, it may be of interest to refine the abstract multi-pathway model to allow the generation of temporal networks which play an important role in practice.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here at: https://github.com/rmuniappan/SPREAD_ model.

AUTHOR CONTRIBUTIONS
AA defined the scope of the research. AA, NP, and YB designed and conducted the experiments and analyzed the results. AA, SR, and HM contributed to the theoretical formulation and results and wrote the paper. All authors contributed to the article and approved the submitted version.  -1908530, NSF Expeditions in Computing Grant CCF-1918656, and NSF CINES OAC-1916805.