Prediction of Pig Trade Movements in Different European Production Systems Using Exponential Random Graph Models

In most European countries, data regarding movements of live animals are routinely collected and can greatly aid predictive epidemic modeling. However, the use of complete movements’ dataset to conduct policy-relevant predictions has been so far limited by the massive amount of data that have to be processed (e.g., in intensive commercial systems) or the restricted availability of timely and updated records on animal movements (e.g., in areas where small-scale or extensive production is predominant). The aim of this study was to use exponential random graph models (ERGMs) to reproduce, understand, and predict pig trade networks in different European production systems. Three trade networks were built by aggregating movements of pig batches among premises (farms and trade operators) over 2011 in Bulgaria, Extremadura (Spain), and Côtes-d’Armor (France), where small-scale, extensive, and intensive pig production are predominant, respectively. Three ERGMs were fitted to each network with various demographic and geographic attributes of the nodes as well as six internal network configurations. Several statistical and graphical diagnostic methods were applied to assess the goodness of fit of the models. For all systems, both exogenous (attribute-based) and endogenous (network-based) processes appeared to govern the structure of pig trade network, and neither alone were capable of capturing all aspects of the network structure. Geographic mixing patterns strongly structured pig trade organization in the small-scale production system, whereas belonging to the same company or keeping pigs in the same housing system appeared to be key drivers of pig trade, in intensive and extensive production systems, respectively. Heterogeneous mixing between types of production also explained a part of network structure, whichever production system considered. Limited information is thus needed to capture most of the global structure of pig trade networks. Such findings will be useful to simplify trade networks analysis and better inform European policy makers on risk-based and more cost-effective prevention and control against swine diseases such as African swine fever, classical swine fever, or porcine reproductive and respiratory syndrome.

In most European countries, data regarding movements of live animals are routinely collected and can greatly aid predictive epidemic modeling. However, the use of complete movements' dataset to conduct policy-relevant predictions has been so far limited by the massive amount of data that have to be processed (e.g., in intensive commercial systems) or the restricted availability of timely and updated records on animal movements (e.g., in areas where small-scale or extensive production is predominant). The aim of this study was to use exponential random graph models (ERGMs) to reproduce, understand, and predict pig trade networks in different European production systems. Three trade networks were built by aggregating movements of pig batches among premises (farms and trade operators) over 2011 in Bulgaria, Extremadura (Spain), and Côtes-d'Armor (France), where small-scale, extensive, and intensive pig production are predominant, respectively. Three ERGMs were fitted to each network with various demographic and geographic attributes of the nodes as well as six internal network configurations. Several statistical and graphical diagnostic methods were applied to assess the goodness of fit of the models. For all systems, both exogenous (attribute-based) and endogenous (network-based) processes appeared to govern the structure of pig trade network, and neither alone were capable of capturing all aspects of the network structure. Geographic mixing patterns strongly structured pig trade organization in the small-scale production system, whereas belonging to the same company or keeping pigs in the same housing system appeared to be key drivers of pig trade, in intensive and extensive production systems, respectively. Heterogeneous mixing between types of production also explained a part of network structure, whichever production system considered. Limited information is thus needed to capture most of the global structure of pig trade networks. Such findings will be useful to simplify trade networks analysis and better inform European policy makers on risk-based and more cost-effective prevention and control against swine diseases such as African swine fever, classical swine fever, or porcine reproductive and respiratory syndrome. inTrODUcTiOn Movements of animals play a key role in the spread of several major infectious diseases, like foot-and-mouth disease, classical swine fever, or African swine fever (1)(2)(3). Therefore, detailed data on livestock movements may help to better simulate transmission dynamics and identify areas, periods, and farms that are more likely to spread the diseases and could be targeted to improve surveillance and control strategies (4,5). However, one of the challenges of using livestock movement data to support decision-making in preventive veterinary medicine is the limited availability of timely and updated records on animal movements and, if available, the massive amount of data that have to be processed. This is particularly challenging when considering diverse and, sometimes, epidemiologically complex, production systems, such as backyard or extensive systems, where the information may not be frequently collected and accessible. Models of livestock movement networks based on holding characteristics and past-temporal observed networks could be useful to simplify real-world networks and to predict disease spread even in backyard or extensive environments. Pig trade movements can be represented as a network, consisting of a set of nodes (here the pig premises) connected by links (also called edges) representing movements of pigs between them. These networks are not strictly identical from 1 year to the following, but their structural properties, which impact disease dynamics, are likely to be stable over time (6,7). These properties emerge from pig trading behaviors. For example, some premises may be more likely to trade with each other due to geographical proximity or because they belong to the same pig company [selective mixing or homophily, see Morris (8)]. Some particular types of premises may also be more likely to trade with a high number of premises (attributes that influence degree). Finally, if a trading partner B of premises A trades with a third premises C, this might encourage A to trade with C (structural balance effect).
The first statistical models developed to evaluate which processes lead to observed network structures were quite simple. They only addressed relational reciprocation [i.e., mutuality; see Holland and Leinhardt (9)] or assortative mixing (8). The recent developments of exponential random graph models (ERGMs), also known as p* models (10), offer possibilities to better capture the complexity of real-life networks (11). This family of models assumes that the observed network is only one realization among many potential networks with similar characteristics and that the probability that a link exists is a logit-linear function of predictors that reflect node characteristics, link characteristics, and network structural properties (10,12,13). Although they were developed to handle the inherent non-independence of network data, the results of ERGMs are interpreted in similar ways to logistic regression, making this a very useful method for examining contact networks in the context of epidemiology.
The aims of this paper were to use ERGMs to (1) develop models that reproduce observed pig trade networks; (2) understand the mechanisms that underlie the organization of pig trade networks; and (3) predict trade networks structures in three different European pig production systems (i.e., industrial, extensive, and backyard). Results of this study are intended to inform the design of prevention and control programs for swine diseases such as African swine fever, classical swine fever, or porcine reproductive and respiratory syndrome under diverse epidemiological scenarios and pig productions systems in Europe.

Data collection and network construction
Three areas were selected to represent different European pig production systems: Bulgaria, where most premises raise pigs for own consumption; the autonomous community of "Extremadura, " which is the cradle of extensive Iberian pig production in Spain; and the department of "Côtes-d'Armor, " which is the French department with the highest concentration of industrial pig premises.
Data on pig movements and premises characteristics were obtained from national databases, through Bulgarian Food Safety Agency in Bulgaria, the professional database of swine (La Base de Données Professionnelle Porcine-BDPORC) in France, and the Ministry of Agriculture, Food and Environment (MAGRAMA) in Spain. The year 2011, which was common in all databases, was retained for analysis. The premises characteristics available were the classification or type of farm (described in the next two sentences), the size of premises (i.e., number of sows, weaners, and finishers on farm), the type of housing system (i.e., indoor or outdoor), the geographical coordinates, and the pig company number (only for France). In Bulgaria, pig farms were classified as small producers (<10 pigs kept for own consumption), type B (medium-size: 10-500 pigs; with low biosecurity level: access to other pigs or feral pigs, use of swill feeding, no fences around the holdings, and/or no disinfection at the entrance and exit of buildings), type A (medium-size, high biosecurity level), or industrial farms (large size: >500 pigs; high biosecurity level) (14,15). Traditionally and outdoor-raised East Balkan pig herds are also found in the South East of Bulgaria. For Spain and France, pig farms were classified as multipliers (premises that produce breeding stocks and semen), farrowing farms, farrow-to-finish farms, finishing farms, or small producers. Small producers for Spain were defined as those that produce pigs for own consumption, whereas for France were those with ≤4 pigs. Traders, collection centers, markets, fairs, and stopping points (i.e., or staging point: locations used to feed, water, rest, accommodate, care for, and dispatch animals in transit before arriving to their final destinations) were considered as trade operators. Because of the dead-end characteristics of slaughterhouses, these premises were excluded from analysis.
For each area, yearly networks (i.e., using year 2011) were built, the nodes being all pig premises of the study areas, even those that were not trading pigs during the study period. Movement data were aggregated over the study period, and a direct link was drawn whenever a shipment of pigs occurred between the corresponding premises. Movement imported from or exported to outside areas was excluded from the analyses. # of edges L(y) # of in-and outgoing edges for each type of production, housing system, pig company

Mi,v,a(y), Mo,v,a(y)
# of edges that are within housing systems, within pig companies, within regions, with differential homophily # of edges that are within housing systems, within pig companies, within regions, with uniform homophily # of edges that are within and between housing systems, within and between type of productions, within and between regions

Sa,v(y)
Euclidean distance between pairs of premises E(y) # of isolates I(y) # of asymmetric links A(y) Geometrically weighted dyadwise shared partners gwdsp(y, α) Geometrically weighted edgewise shared partners gwesp(y, α) Geometrically weighted in-and out-degree distribution gwid(y, α), gwod(y, α) a Some statistics use attribute-specific terms where a and v represent the attribute and level, respectively. The observed network is represented by y and the scale parameter by α. The ergMs Exponential random graph models specify the probability of any random network Y given a set of n nodes and their attributes as in Eq. 1.
The zk(y) terms represent model covariates, any set of K network statistics calculated on the y observed network and hypothesized to affect the probability of this network forming. The model covariates can include network parameters that account for the frequency of occurrence of certain network configurations (e.g., two-path, triangles), as well as node or edgewise covariates like the pig company to which a premise belongs or the distance between two premises, respectively. The θ coefficients estimate the strength of the effect of each covariate. The denominator c represents the normalizing constant, which correspond to the ) over all possible networks with n nodes. Because ERGMs' calculation time dramatically increased with the increase of network size, it was decided to exclude isolates, i.e., pig premises that did not trade with other premises, from the small-scale productions system (Bulgaria, initially 28,729 premises, of which 95.3% were isolated premises).

Model specification
First, an exploration of network data was undertaken, with the computation of several local topological measures (number of isolates, triangles, degree distribution, etc.) and of mixing matrices for premises' attributes (16). Specifically, we computed the number of nodes, the network density, the percentage of isolates, the clustering coefficient, and the mean and range of in-degree and out-degree centrality measures [e.g., Ref. (5)]. Network graphs were plotted, with the nodes colored according to nodes' attributes, to better visualize the selective mixing patterns.
Based on this exploration, several network statistics were chosen to represent hypothetic rules for trade movements ( Table 1). L(y) captures the density of the observed network y. Mi,v,a(y), Mo,v,a(y), Ha,v(y), Ua,v(y), Sa,v(y), and E(y) are attributespecific terms that capture the way in which premise attributes structure trading patterns, where a represents the attribute (e.g., housing system) and v the level (e.g., indoor, outdoor). The main effects, Mi,v,a(y) and Mo,v,a(y), allow variation in the propensity of a premise to form incoming and outgoing edges according to the level of an attribute characterizing this premise. Ha,v(y) models a tendency of edges to occur between premises belonging to the same attribute level that varies among attribute levels (hereafter referred to as differential homophily), while Ua,v(y) models a uniform tendency of edges to occur between premises belonging to the same attribute level (hereafter referred to as uniform homophily). Sa,v(y) accounts for variation in the occurrence of edges according to the levels of an attribute characterizing each of two premises (hereafter referred to as selective mixing). E(y) captures variation in the propensity of premises to form links according to the Euclidean distance in km to other premises.
A(y) and I(y) model the tendency of premises to form unidirectional links or no links, respectively. The terms gwdsp(y, α), gwesp(y, α), gwid(y, α), and gwod(y, α) are related to local structures and represent the parametric forms of the alternating twopaths, clustering (alternating k-triangles) and in-and out-degree distributions, respectively. A fixed value of 0.5 was adopted for the scale parameter α in these terms (11).
The Markov Chain Monte Carlo (MCMC) algorithm was used to estimate the maximum likelihood for the θ coefficients included in models (12). The MCMC chain is intended to step around the sample space of possible networks, selecting a network at regular intervals to evaluate the statistics in the model. For each MCMC step, n (n = 1 in the simple case) toggles are proposed to change the dyad(s) to the opposite value. A chain burn-in of 10 5 toggles, an MCMC sample size of 10 4 , and an interval between successive samples of 10 3 was fixed for these models.

Model selection and goodness of Fit
For each area, four models were built: (1) a simple Bernoulli model that only includes the number of edges; (2) a model with edges and statistics based on nodal attributes (hereafter called "edge + attribute" model); (3) a model with edges and structurerelated statistics ("edge + network statistics" model); and (4) a model with edges, nodal attributes, and structure-related statistics ("edge + attributes + network statistics" model).
For the "edge + attribute" and "edge + network statistics" models, univariable analyses were performed first. The terms (i.e., attributes and network statistics) were then added one by one, until the best model fit was obtained. The fourth model was based on the final "edge + attribute" model, and network statistics terms were added one by one manually until the best model fit was obtained.
Three approaches were used to examine goodness of fit of the models: (1) check for model convergence and degeneracy; (2)    comparison of Akaike information criteria; and (3) comparison of goodness of fit plotting for higher order statistics (11). For this purpose, four sets of statistics were used: the in-and out-degree distributions, the geodesic distance distribution, and the edgewise shared partner distribution, which reflects the clustering of the network (17). These statistics were chosen because of their impact on disease spread dynamics (18). Finally, plots of simulated networks were visually compared to the plot of the observed networks.

resUlTs
A total of 7,811 out of the 45,224 premises keeping pigs (i.e., 17.3%) were actively moving pigs during 2011. Description of the pig industry demographics (i.e., number of premises for each type of farm), pig trade (to/from different types of farm), and topological characteristics in Côtes-d'Armor (France), Bulgaria, and Extremadura (Spain) in 2011 are presented in Tables 2-4.
The inclusion of both nodal attributes and network configurations statistics provided the best fit to the data (Tables 5-7; Figures 1 and 2). Selective mixing between premises according to their type of production appeared to be an important mechanism of pig organization, whichever system considered (Tables 5-7). In addition to this mechanism, the other mechanisms related to premises characteristics that impacted the most on trade organization were belonging to the same pig company, the tendency of outdoors premise to trade with outdoor premises, and the geographical location of pig premises, in the intensive, extensive, and small-scale production systems, respectively (Tables 5-7). Network statistics on dyadwise and edgewise shared partner distributions, as well as on in-or out-degree distributions were needed to fit the models (Tables 5-7). These statistics better reflected the clustering of the observed networks and allowed to reproduce well the observed global network properties (Figures 1 and 2). This can be clearly observed in the goodness of fit diagnostics plot (Figure 2), where the value of all the observed network statistics (solid black line) is only well captured by the distribution of values of the simulated networks (underlying boxplots) generated with the final ERGM model (i.e., model with edges + attributes + network statistics).

DiscUssiOn
Exponential random graph models were used to represent, understand, and predict pig trade networks structures from different European production systems, with predominantly small-scale, extensive, or intensive pig producers. Such information improved our understanding of the processes that govern the organization of pig trade and can further be used to better inform European policy makers on prevention and control Table 6 | Parameter coefficients and fit for the four exponential random graph models (ergMs) of pig trade in an extensive production system (spain-extremadura).
covariates ergM coefficients (se) a bernoulli (edges) edges + attributes edges + network statistics edges + attributes + network statistics measures against swine diseases such as African swine fever, classical swine fever, or porcine reproductive and respiratory syndrome. Rapidity of targeted action during the initial phase of an outbreak is fundamental to effectively curtail the transmission and minimize the disease burden. At this time, movements of animals have not been banned, and it is thus relevant to use "peace-time" movement networks to compare different control strategies. Until recently, variability in contact patterns was mostly approached in epidemic models by combining probabilities of contact between premises according to their type of production and to the distance between premises (22,23). These efforts may fail to capture the structural properties of livestock trade networks that will impact diseases dynamic as well as their spatial spread (24)(25)(26). The use of ERGMs to model pig trade networks allows capturing both network topology and complex behaviors that depend on various premises characteristics. They may thus help to generate more realistic networks that may be used to study diseases spread, identify premises that could be targeted for risk-based surveillance, early detection, and rapid control of diseases, and compare different control strategies (27)(28)(29). Indeed, by simulating diseases spread on several simulated networks, we could identify some farms that are frequently and early infected and thus that should be targeted to provide timely and accurate indications of epidemic activity (30,31). These networks could also be simulated to  strongly structured pig trade organization in the small-scale production system, whereas belonging to the same company, or keeping pigs in the same housing system appeared to be key drivers of pig trade, in intensive and extensive production systems, respectively. As expected, the specialization and organization of pig production also explained a part of trading behaviors, illustrated by the heterogeneous mixing between types of production. This mechanism was however less important than the geographical location of premises and as important as belonging to the same pig company in the small-scale and intensive production systems, respectively. Geographical proximity did not appear to play a role in the intensive system, whereas it was significant in the small-scale producer system. Unfortunately, model degeneracy occurred when trying to include the distance between premises as a covariate in the extensive production system (Spain-Extremadura), preventing a conclusion to be drawn on the impact of this covariate in this production system. Finally, this study revealed that the inclusion of nodal attributes was necessary to represent the mixing patterns, but it was not sufficient to reproduce the great clustering observed in the pig trade networks, which could only be represented when adding additional statistics on local network configurations. These statistics reveal some features of these networks, such as the propensity of trade to have a short path length (negative coefficient for the degree distribution terms). Some social behavior, economic factors, or unobserved covariates, which may differ between countries, may also have driven the choices of farmers for trading partners (e.g., pig prices, road distribution, traditions, or cultural practices, etc.). This may explain the increased clustering as represented by the positive coefficient for the gwesp term.
Until recently, problems of degeneracy and computational intractability for large network sizes limited the use of ERGMs in epidemiological modeling (11,34). Indeed, ERGMs have been mainly used on small networks to understand the factors driving human social behaviors (35,36) and have sometimes been applied on disease transmission modeling (37). Ortiz-Pelaez et al. (38) were the first to introduce ERGMs in preventive veterinary medicine, using this method to understand the factors driving livestock trade in a small network of villages in Ethiopia. In the present study, the use of new parameters that limit degeneracy problems (12) allowed us to obtain statistical models with a good fit to the large-size observed networks. Isolates always depends on the spatial and temporal "frontiers" that are decided and on the exchange with farms outside these "frontiers. " For Extremadura, most of movements were inside the region (i.e., from the 9,544 isolates, 174 of them sent pigs to farms outside Extremadura and 73 received pigs from farms outside Extremadura); therefore, most of the isolates (97.4%) can be considered as true isolates during the study period (i.e., not movements within the region and not trading with other regions). For Côtes-d'Armor, there was a lot of exchange with other departments, and only 47% of the 489 isolates can be considered as true isolates (i.e., 57 sent pigs to farms outside Côtes-d'Armor and 202 received pigs from farms outside the region). Therefore, the scale to simulate networks for disease spread models should consider areas where almost all movements are inside these areas. For Bulgaria, the entire country was evaluated, and all were considered to be true isolates; however, it was not feasible to include these isolates in the ERGM due to memory limits, and therefore the model produced here might not fully represent the true pig movement network in Bulgaria. Further studies should be conducted to validate this network once computational difficulties to fit ERGM's to large networks are solved.
Since implementation of Regulation (EC) no 1760/2000 of the European parliament, recording of livestock movements between premises is mandatory, making data on pig trade movements available, at least in the main producing countries in the EU. However, there are no standards on the definition of different types of premises (e.g., backyard) or on other premises attributes. The scales of the networks considered in this study were also different, being the national level for Bulgaria and the regional level for France and Spain, to better study very specific production systems. Therefore, though the analyses and results in the different settings intended to illustrate the applicability and usefulness of the approach in the predominant swine production systems in the EU, mechanisms and rules that govern trade organization in the different study populations are not fully comparable.
Several studies showed that, in addition to the topology of a contact network, heterogeneity in the weight of edges and temporal network dynamics had a strong influence on diseases spreading (39,40). Tools to model such networks are still under development, and their application is currently limited by the size of the networks modeled (41)(42)(43). In the next few years, these methods could be promising tools to improve our representation of real-world livestock trade networks.

cOnclUsiOn
This study is one of the very first to illustrate the usefulness of ERGMs to understand and simulate livestock trade networks under different European production systems, specifically small-scale, extensive, and intensive swine production systems. Depending on the production system, some premises characteristics, such as their geographical location, type of production, belonging to a pig company or housing system, were key drivers of pig trade, but adding statistics on local network configurations was necessary to accurately capture the great clustering observed in all pig trade networks. These models offer a framework to simulate realistic pig trade networks that may be included in epidemic models to compare different control strategies against major swine diseases such as African swine fever, classical swine fever, or porcine reproductive and respiratory syndrome. aUThOr cOnTribUTiOns AR, BM-L, and VG designed the study and developed the R codes. AR and BM-L gathered, cleaned, and verified the data. TA, AW-S, SM, EE, and JS-V contributed to the interpretation and critical discussion of the nature, characteristics, and structure of the data for the different study regions. AR carried out the analyses and wrote the draft of the manuscript. All authors participated in the interpretation and discussion of the results, read, edit, and approved the final manuscript.