Applications of Bayesian Phylodynamic Methods in a Recent U.S. Porcine Reproductive and Respiratory Syndrome Virus Outbreak
- 1Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
- 2Environmental and Life Sciences Research Center, Kuwait Institute for Scientific Research, Kuwait City, Kuwait
- 3Department of Veterinary and Biomedical Sciences, University of Minnesota, St. Paul, MN, USA
Classical phylogenetic methods such as neighbor-joining or maximum likelihood trees, provide limited inferences about the evolution of important pathogens and ignore important evolutionary parameters and uncertainties, which in turn limits decision making related to surveillance, control, and prevention resources. Bayesian phylodynamic models have recently been used to test research hypotheses related to evolution of infectious agents. However, few studies have attempted to model the evolutionary dynamics of porcine reproductive and respiratory syndrome virus (PRRSV) and, to the authors' knowledge, no attempt has been made to use large volumes of routinely collected data, sometimes referred to as big data, in the context of animal disease surveillance. The objective of this study was to explore and discuss the applications of Bayesian phylodynamic methods for modeling the evolution and spread of a notable 1-7-4 RFLP-type PRRSV between 2014 and 2015. A convenience sample of 288 ORF5 sequences was collected from 5 swine production systems in the United States between September 2003 and March 2015. Using coalescence and discrete trait phylodynamic models, we were able to infer population growth and demographic history of the virus, identified the most likely ancestral system (root state posterior probability = 0.95) and revealed significant dispersal routes (Bayes factor > 6) of viral exchange among systems. Results indicate that currently circulating viruses are evolving rapidly, and show a higher level of relative genetic diversity over time, when compared to earlier relatives. Biological soundness of model results is supported by the finding that sow farms were responsible for PRRSV spread within the systems. Such results cannot be obtained by traditional phylogenetic methods, and therefore, our results provide a methodological framework for molecular epidemiological modeling of new PRRSV outbreaks and demonstrate the prospects of phylodynamic models to inform decision-making processes for routine surveillance and, ultimately, to support prevention and control of food animal disease at local and regional scales.
Porcine Reproductive and Respiratory Syndrome (PRRS) is, arguably, the most important swine disease in the United States due to the continuous emergence of new outbreaks that cause severe economic losses (Neumann et al., 2005; Holtkamp et al., 2013). Type 2 PRRSV, which is endemic in North America, was discovered in 1989 in the U.S., although the earliest serological evidence was found in eastern Canada (Benfield et al., 1992; Zimmerman, 2003; Murtaugh et al., 2010). PRRSV is a single-stranded, enveloped RNA virus that belongs to the Arteriviridae family (Benfield et al., 1992). Its genome consists of nine open reading frames (ORF) that code seven structural proteins and 14 non-structural proteins (Dokland, 2010). ORF5 encodes a major envelope surface glycoprotein (GP5) with high genetic diversity, thus has been widely used in molecular epidemiology studies of PRRSV (Kapur et al., 1996; Shi et al., 2010; Brar et al., 2015).
PRRSV transmission is rapid and can occur through direct and indirect contact (Dea et al., 2000; Cho et al., 2007). Emerging PRRSV strains are capable of spreading over long distances, referred to as distance-independent dispersal, as a result of aerosol transmission, animal movements, and use or movement of contaminated semen, equipment, or trucks (Shi et al., 2010, 2013). The combination of varied transmission routes and absence of regulated control and prevention activities makes virus control or elimination, at both local and regional levels, extremely challenging (Corzo et al., 2010; Rowland and Morrison, 2012). Hence, intensifying efforts toward designing effective and efficient surveillance programs, with the long-term goal of eliminating the disease, must be prioritized to minimize the current impact of the PRRSV on the US swine industry (Perez et al., 2015).
Since the 1980's, the U.S. Department of Agriculture has conducted extensive surveillance activities for swine diseases using classical statistical sampling methods that can account for imperfect diagnostic testing (Cameron and Baldock, 1998). However, current disease surveillance activities do not fully account for modern swine production systems in which pigs are spatially separated by age or production stage, or for pathogens that evolve rapidly (Rowland and Morrison, 2012; Perez et al., 2015).
In the past few decades, many studies investigated the molecular epidemiology of PRRSV, due to its high potential for mutation and recombination (Martín-Valls et al., 2014). Some studies focused on establishing associations between the evolutionary features of PRRSV and epidemiological characteristics of outbreaks in different geographical levels (Goldberg et al., 2000; Shi et al., 2010, 2013; Yoon et al., 2013; Nguyen et al., 2014; Rosendal et al., 2014). Others discriminated between novel and preexisting strains to model viral spread and maintenance within affected populations (Larochelle et al., 2003; Tun et al., 2011; Alonso et al., 2013; Brito et al., 2014; Chen et al., 2015). Whether the studies used classical phylogenetic methods to either genotype newly emerging PRRSV strains on the basis of restriction fragment length polymorphism (RFLP) patterns, or assessed correlations between the similarities of nucleotide sequences and other epidemiologic features, they typically ignored uncertainties associated with estimates of phylogenetic relationships, temporal factors, and spatial factors (Suchard et al., 2001). Furthermore, they examined the temporal and spatial dynamics of the virus isolates in separate methodological settings, and attempted to draw conclusions from the outputs of both epidemiological and evolutionary analytical methods (Suchard et al., 2001). Therefore, many methodological approaches previously used to study PRRSV have ignored that evolutionary and epidemiological dynamics of rapidly evolving pathogens like PRRSV occur on approximately the same time-scale, and thus, they must be studied in a unified methodological setting in order to be properly understood and to prevent biased conclusions, subsequently improving the related decision making processes (Pybus et al., 2013). The field of phylodynamics aims to model, in a Bayesian statistical framework, the joint evolutionary, and epidemiological characteristics of rapidly evolving pathogens using analytical methods from the well-established field of phylogenetics (Grenfell et al., 2004). This approach uses important evolutionary parameters of rapidly evolving pathogens as random variables, and assigns a specified prior probability distribution for each parameter to infer their corresponding posterior probability distribution (Lemey et al., 2009). Thus, such Bayesian framework provides powerful analytical tools capable of accounting for uncertainties in the evolutionary parameters, including the pathogen phylogeny, population demographics, size, and history of dispersal between geographical regions and hosts (Lemey et al., 2009).
Bayesian phylodynamic models have recently become well-established tools for studying the evolution of many infectious viral diseases. However, only a few studies have modeled the evolutionary dynamics of PRRSV (Tun et al., 2011; Shi et al., 2013; Brito et al., 2014; Nguyen et al., 2014; Chaikhumwang et al., 2015). Such studies revealed the potential of phylodynamic methods in answering many long-standing questions on the molecular epidemiology and evolution of PRRSV. Furthermore, the method has previously been applied in a research context, rather than for routine surveillance of field data intended to support disease prevention and control. Such implementation is challenging because of the complexity and size of the data being analyzed. Data with these features, sometimes referred to as big data, requires special procedures for preparation and analysis.
The objective of this study was to demonstrate the application of Bayesian phylodynamic models to data routinely collected by swine production systems to support a near real-time early warning surveillance system for PRRSV and, potentially, other food animal viruses. The method was applied to the spread of a virulent RFLP 1-7-4 type PRRSV between 2014 and 2015 in the U.S. A discrete-trait phylodynamic model was adopted to estimate both the geographical history of viral migration and the movement of the virus among age groups of pigs. Our study provides quantitative estimates of mechanisms that lead to the emergence, spread and maintenance of the RFLP 1-7-4 PRRSV family throughout the U.S. It further illustrates the prospects of the Bayesian approach in improving the decision making process related to reducing the impact of PRRS on the national swine industry with the long-term goal of successful control and prevention.
Materials and Methods
Complete PRRS ORF5 nucleotide sequences (n = 6774) from field isolates obtained between January 1998 and April 2015 were provided by five independent swine production systems in the U.S. with metadata on the date of isolation, system code (A, B, C, D, and E) and type of farm (farrow to wean or farrow to feeder sow farms and growing pig farms) from which the sequences were isolated (Table S1). Sequences were deposited in Genbank with accession numbers KT902023–KT905410 and KU501407–KU504248. The data were shared under agreement that identity and location of participants and their respective farms was confidential. Sequencing was performed according to the procedures in use at the time in various veterinary diagnostic laboratories or in private laboratories on a fee-for-service basis.
Preliminary Phylogenetic Analysis
The complete sequence database was manually validated for presence of a complete ORF5 and absence of ambiguous nucleotides then aligned using MUSCLE version 3.8 (Edgar, 2004). A maximum likelihood (ML) phylogenetic analysis was performed in MEGA6, resulting in identification of a cluster of 288 sequences that were further studied. The sequence file was re-aligned using MUSCLE, and adjusted manually using amino-acid translation method implemented in Mesquite version 3.01 (Maddison and Maddison, 2011), to ensure that the protein-coding region of ORF5 remained in frame. Sequences with 100% nucleotide identity were removed (34%) from the subsequent analyses. While using Recombination Detection Program version 3 (RDP3), no homologous recombination was detected in the remaining sequences (Martin et al., 2010). For this analytical approach, it is important to select the substitution model that best describes the specific virus. For example, it was found that for some Dengue viruses, the mixed substitution model best fit the data (Drummond and Rambaut, 2007). That may, however, not be true for PRRSV. Thus, the best fitting partitioning scheme and nucleotide substitution model were selected using the Bayesian Information Criterion (BIC) implemented in PartitionFinder V 1.1 (Lanfear et al., 2012). Finally, maximum-likelihood estimates of the phylogeny under the selected mixed-substitution model were used to assess the degree of topological (in)congruence, in which 100 non-parametric bootstrap replicate searches were performed using RAxML version 8 (Stamatakis, 2014).
Divergence-Time, Growth Rate, and Population Size Estimation
Divergence time was estimated using the relaxed molecular-clock model with GTR+Γ4 mixed-substitution, which was selected based on the results of PartitionFinder analysis mentioned above, implemented in BEAST v 1.8 (Drummond and Rambaut, 2007). To estimate divergence time and viral growth rate within each production system, we assessed the fit of the sequence data to five node-age coalescent priors, namely, (1) constant population size assuming that the population growth rate is zero (Griffiths and Tavare, 1994); (2) exponential growth assuming that the population growth rate is fixed over time (Griffiths and Tavare, 1994); (3) expansion growth assuming that the population growth rate increases over time (Griffiths and Tavare, 1994); (4) logistic growth assuming that the population growth rate decreases over time (Griffiths and Tavare, 1994); and (5) piece-wise-constant Bayesian skyline coalescent model (BS) assuming effective population size is experiencing episodic stepwise change over time (Drummond et al., 2005). For each node-age model, we compared the uncorrelated exponential (UCED) and the log-normal (UCLN) relaxed clock branch-rate prior models, to assess whether our sequence data had a substitution rate on adjacent branches that sampled from either shared exponential or log-normal distributions, respectively. Isolation dates of the sequences were used to calibrate divergence-time estimates. We first estimated the marginal likelihood for each of the 10 candidate phylodynamic models from the resulting posterior samples using the posterior simulation-based analog of Akaike's information criterion (AICM; Raftery et al., 2007), which were estimated using Tracer version 1.6 (Suchard et al., 2001; Rambaut et al., 2014). The AICMs and their Monte Carlo standard errors (SE) were calculated using 1000 replicates. Bayes factor (BF) comparisons indicated that the sequence data followed a population expansion growth with a UCED branch-rate model, which provided the best fit for ORF5 (BF > 25 for the log marginal likelihood) among parametric models (Table S2). However, the BF comparison was not significant when the expansion model was compared against the BS coalescent tree prior (Table S2). Hence, the BS coalescent tree prior model with a UCED branch-rate was used to estimate changes in the effective population size through time (File S2; Minin et al., 2008).
We used the Markov Chain Monte Carlo (MCMC) algorithms implemented in BEAST to estimate the joint posterior probability distributions of the model parameters. For each MCMC simulation, we run 3 × 108 cycles, which was thinned by sampling every 10,000 cycles. Two replicate MCMC simulations were carried out to aid in assessing simulation performance. We used Tracer to evaluate convergence of each candidate model by estimating effective sample sizes (ESS) for each posterior parameter. Hence, our ESS evaluations suggested that the MCMC algorithms requires the removal of the first 10% of the samples (the “burn-in”) to provide reliable approximations of the posterior probability densities for each estimated parameter. We used Tree Annotator to summarize the posterior results in form of maximum clade credibility (MCC) trees. A BS plot was generated to infer effective population size (EPS) of the virus between 2001 and 2015, in which the EPS is defined as the relative genetic diversity (NeT), where Ne is the effective population size and T is the generation time (Minin et al., 2008).
Estimation of Viral Dispersal History between Regional Systems
Geographical location was incorporated as described elsewhere (Lemey et al., 2009). Briefly, We reconstructed the phylogeny of the virus by incorporating discrete traits (i.e., systems), to describe the dispersal evolution of PRRSV epidemic among those selected systems. We used the continuous-time Markov model implemented in BEAST to model the dispersal history among systems as discrete states, which comprised a number of non-zero transition rates identified by a Bayesian stochastic search variable selection (BSSVS) approach (Lemey et al., 2009). Furthermore, we investigated directionality of the geographical dispersal of the virus among systems by assessing the fit of the data to two candidate discrete trait models (Table S2), including both symmetric and asymmetric models with irreversible and reversible transitions, respectively. Here, the symmetric model with irreversible transitions indicate that the directional spread of the virus between two systems (A → B or/and B → A) is insignificant, while the asymmetric model with reversible transition indicate that the directional spread between two systems (A → B or/and B → A) is significant. To reconstruct the history of viral migration between discrete system areas, we used the coalescent Gaussian Markov Random field (GMRF) Bayesian Skyride model as a prior on the node times in the tree and a mean-one exponential prior for the rate parameters of the candidate models, while we used the same remaining parameters described in the above analyses (e.g., substitution model, UCLN, and UCED branch-rate models). Similarly, we estimated the marginal-likelihoods in order to compute the BFs to select among the candidate models (e.g., UCLN symmetric vs. UCED Asymmetric; Table S2; File S3). We used FigTree version 1.4 (Rambaut, 2012) to plot the summarized MCC consensus tree with the root state posterior probabilities (RSPP) of systems areas. Here, the RSPP is defined as the posterior probability of transition from one discrete trait to another mapped onto the interior nodes of the phylogeny of the virus, in which a discrete trait with a high RSPP indicate that trait as the likely ancestral trait of the given phylogeny. Finally, we used SPREAD version 1.0.6 (Bielejec et al., 2011) to identify non-zero transition rates between discrete traits (i.e., significant dispersal routes among systems). We used a BF cutoff = 6 to assess the strength and significance of transition rates between discrete geographic system areas. Because actual centroids of the site locations were confidential, relationally correct, anonymous latitude and longitude locations were placed in Alaska and a keyhole markup language (KML) file was generated to visualize regional migration of the virus.
Modeling Viral Transmission in a System
Evolutionary movement between farm types (a proxy for production type), in which farm type were classified as sow herd (e.g., farrow to wean and farrow to feeder sow), and all other farms (e.g., finisher and nursery). A discrete-trait model was used for farm type (sow herd, other farms) to infer the history of PRRSV migration between farm types through time. The number of non-zero transition rates in the model was estimated using BSSVS. The relative strength of transition rates (e.g., sow farms → all other farms) was estimated using Bayes factors (BFs). We estimated the ancestral states (farm type) at internal nodes of the tree under a composite phylogenetic model that included the above detailed analyses. We used FigTree to plot the MCC consensus tree with the RSPP of the discrete trait and we assessed the strength of transition rates between states (farm types) using the BF comparisons implemented in SPREAD similar to the above analyses. Similarly, the use of the asymmetric or symmetric discrete trait models allowed us to assess the strength and significance of directionality between farm types (e.g., sow farms → other farms, or/and farms → sow farms; File S4).
Uncertainty and Statistical Analysis of Discrete-Trait Mappings
We used the Kullback–Leibler divergence (KL) statistic to quantify the magnitude of phylogenetic uncertainty in the discrete-trait estimates of the RSPP (for regional systems and farm type; Kullback and Leibler, 1951). KL statistics were calculated for each selected tree using the Razavi function (Razavi, 2008) implemented in Matlab v 2013a (MathWorks, 2012) to measure the departure between prior and the corresponding posterior probability distributions for a given phylodynamic parameter (i.e., in this case the RSPPs). A large KL-value (KL > 1) indicates that the prior provided sufficient information for estimating the posterior parameters. Finally, we calculated the parsimony score (PS) and the association index (AI) statistics to assess the hypothesis that a taxon with a given trait (farm type or regional system) is more likely to share that trait with adjoining taxa in the MCC tree than would be expected by chance. The AI and PS statistics were calculated using Bayesian Tip-Significance Testing (BaTS) software version 1.0 (Parker, 2008). Significant AI and PS statistics indicate that our selected trait did have a significant role in shaping the posterior phylogeny of the sequence data.
Results and Discussion
Preliminary Phylogenetic Analysis
The ML analysis was performed to screen-out unrelated strains and because, although RFLP nomenclature is typically used to refer to PRRSV strains, the RFLP method is not an accurate discriminator of phylogenetic relations. As a result of the ML analysis, a total of 288 sequences, with isolation dates between September 2003 and March 2015, formed a phylogenetic branch shown in Figure 1. Within the branch a single, monophyletic clade of 241 sequences obtained in 15 months, between January 2014 and March 2015, stood out (Figure 1, Table S1). Those 241 were identified by the dominant RFLP-type, 1-7-4, whereas the other 45 genetically related strains belonged to a number of other RFLP types. Two nearest neighbor 1-7-4 RFLP types (depicted as green dots in Figure 1, Table S1) were collected in August 2012 and March 2007, whereas the two red dots indicated a 1-7-4 type isolated in January 2004 and a 1-4-4 type isolated in October 2006 (Figure 1, Table S1).
Figure 1. PRRSV RFLP 1-7-4 cluster filtering. (A) Maximum likelihood tree of complete PRRSV sequence dataset (N = 6774) with the 1-7-4 cluster expanded. (B) The ML tree of extracted sequences containing the 1-7-4 cluster. Two nearest neighbor 1-7-4 RFLP types (green dots) were collected in August 2012 and March 2007. Two red dots indicated a 1-7-4 type isolated in January 2004 and a 1-4-4 type isolated in October 2006. The figure was generated from File S1.
Divergence-Time, Growth Rate, and Population Size
For the PRRSV ORF5 sequence dataset isolated between September 2003 and March 2015, the BF comparisons significantly favored the parametric expansion node-age coalescent model, indicating that the population size of the current 1-7-4 type PRRSVs was under rapid increase with an estimated mean growth rate of 1.02 (95% highest posterior density, HPD, from 0.59 to 1.46) and mean evolutionary rate of 3.27 × 10−3/site/year (95% HPD from 2.37 × 10−3 to 4.27 × 10−3), which lays within the range of previously estimated evolutionary rates for PRRSVs isolated from different geographical locations and period of times (Forsberg, 2005; Nguyen et al., 2014; Chaikhumwang et al., 2015). However, analysis of the virus population dynamics revealed a distinct continuous increase in the genetic diversity of the virus in March 2015, with no signs of population decline. This corresponds to the current increase of PRRSV incidence throughout the regional production systems in the U.S. (Figure 2). Our findings suggest the rate, or speed, at which the number of PRRSVs in the population increased, sometimes referred to as growth rate, was higher, compared to earlier phylogenetic relatives. This higher growth rate may suggest expanding diversity, and an unusual continuous increase in the relative genetic diversity over time, compared to those earlier phylogenetic relatives. That finding may be attributed to an evolutionary drift that resulted from either continuous circulation or maintenance within the production region, or recombination events with field viruses migrated from other production regions (Wang et al., 2015). An earlier study also suggested that this expanding diversity behavior of newly emerging strains is attributed to environmental factors associated with the continuous changes in swine husbandry practices rather than intrinsic factors within the host species (Murtaugh et al., 2010). The estimated divergence time for this sequence dataset was September 1996 (95% HPD, July 1986–December 2001), which completely overlaps with the TMRCA of sequences isolated from system C (Table 1). The youngest divergence time estimated for the viruses isolated from system A was August 2009 (95% HPD, December 2007–August 2011; Table 1).
Figure 2. Bayesian Skyline plots (BSP) illustrating temporal changes in the relative genetic diversity of Porcine Reproductive and Respiratory Syndrome Virus RFLP type 1-7-4 isolated between September 2003 and March 2015 in the United States estimated from the ORF5 gene sequences. Line plots summarize estimates of the effective population size (NeT), a measure of genetic diversity, for ORF5 gene segment; the shaded regions correspond to the 95% HPD (Upper, Red; Median, Blue; Lower, Green).
Viral Dispersal History between Regional Systems
The asymmetric variants of the discrete-trait model did not achieve full convergence, even after increasing the number of MCMCs to 1 × 1010 cycles; and therefore were discarded from the subsequent analyses. Our BF comparisons suggested that the symmetric UCED branch-rate model had the largest log-marginal likelihood (BF > 25), and hence, was chosen as the best fitting phylodynamic model for ORF5 gene regions (Table S2). This result suggested that unidirectional spread of the virus between systems, when designated as origin and destination, had no significant role in the evolution of the currently circulating PRRSV. Figure 3 shows the ORF5 RSPP along with the time-scaled MCC tree (Figure S1). We also generated a KML file to demonstrate the temporal dynamics and spatial diffusion of the virus between systems (File S5). System C was strongly supported as the most likely regional system of origin for the currently circulating RFLP type 1-7-4 with a substantially large RSPP of 0.95. Divergence-time estimates under the discrete trait model indicated that the viral dispersal event from system C was initiated in September of 2000 (95% HPD, July 1999–December 2002). Significant (BF > 6) nonzero rates for the dispersal routes between systems are summarized in Table 2. Our results suggest that the most significant routes of virus exchange were estimated exclusively between system C and all other remaining systems. Interestingly, no significant routes of viral exchange were estimated between systems other than C (Figure 4). Uncertainty and statistical analyses for validating the fit of the sequence data to the selected discrete-trait phylodynamic models are summarized in Table 3. The KL-value suggests that the data under the selected discrete phylodynamic model was able to generate RSPPs that are substantially different from the underlying priors and thus the posterior tree is statistically robust. Furthermore, the AI and PS tests rejected the null hypothesis of no association between sampled system and the structure of the phylogeny (P < 0.05). This strongly suggests that the geographical distribution of swine systems are indeed having a significant role in shaping the phylogeny of endemic and newly emerging PRRSV in the US. This role mainly relies on the characteristics of the hog transportation network between systems (Shi et al., 2013; Thanapongtharm et al., 2014; Brar et al., 2015).
Figure 3. Maximum clade credibility (MCC) phylogenies of ORF5 gene of Porcine Reproductive and Respiratory Syndrome Virus RFLP type 1-7-4 cluster in the United States. The color of the branches represents the most probable system type of their descendent nodes. The color-coding corresponds to the upper left figure, which represents the regional system root state posterior probability (RSPP) distributions. The figure was generated from File S6.
Figure 4. Bayes factor (BF) test for significant non-zero rates in ORF5 gene of Porcine Reproductive and Respiratory Syndrome Virus RFLP type 1-7-4 cluster in the United States. Only rates supported by a BF greater than six are indicated. The color of lines correspond to the probability of the inferred transmission rates. Blue and red line gradients indicate relatively weak to strong support, respectively. Site locations for the five systems (A–E) were anonymous and therefore latitude and longitude locations were placed in Alaska.
Table 3. Uncertainty and statistical analyses for assessing the fit of the viral data to the discrete phylodynamic models.
Virus Transmission Patterns in a System
There were two reasons for exploring the role of farm type in viral transmission, (1) to demonstrate how discrete traits may be incorporated in the analysis, and (2) to test the biological soundness of the model results, given that one would expect PRRSV spread to occur mostly from sow farms into other types of farms, following the natural flow of animals. BF comparisons indicated that the Asymmetric UCED branch-rate model with reversible transitions provided the best fit for ORF5 gene regions (Bayes factor > 25 for the log marginal likelihood; Table S2). Sow farms were the most likely ancestral farm type for the currently circulating type 1-7-4 PRRSV (RSPP = 0.95; Figure 5; Figure S2). Our divergence-time estimates suggest PRRSV originated in sow farms approximately in September of 1999 (95% HPD, July 1997–December 2001), and that it was maintained and circulated in sow farms until now. Only one significant nonzero rate transmission route was observed exclusively (BF > 6) from sow farms to all other farms. However, most branches of the MCC tree under the farm type phylodynamic model were weakly supported (Branch rate posterior probability < 0.6). In addition, the low KL-value under the farm type model was substantially less robust, when compared to the systems model (Table 3). This is because, small KL divergence statistic values between any prior and posterior probability distributions indicate that the data contain little information regarding the value of the selected parameter, and therefore, its posterior probability distribution will be similar to the corresponding prior probability distribution (Lemey et al., 2009). Furthermore, the AI and PS test failed to reject the null hypothesis of no association between farm type and the structure of the phylogeny (P > 0.05). This is expected because the typical structure of swine farms in the US is farrow-to-weaning, which in turn segregates breeding pigs from growing pigs, and thus, makes sow farms more likely as sources of virus spread through pig movement than growing pigs sites from which most pigs go to market (Jeong et al., 2014). Therefore, and as suggested elsewhere, sow farms are more likely sources of transmission and maintenance of newly emerging PRRSVs (Kwong et al., 2013).
Figure 5. Maximum clade credibility (MCC) phylogenies of ORF5 gene of Porcine Reproductive and Respiratory Syndrome Virus RFLP type 1-7-4 cluster in the United States. The color of the branches represents the most probable farm type of their descendent nodes. The color-coding corresponds to the upper left figure, which represents the production type root state posterior probability (RSPP) distributions. Black branches in the tree indicate posterior probability < 0.60. The figure was generated from File S7.
Value to Participants, the Swine Industry, and Society
The veterinarians and pork producers who voluntarily share the disease status and location of their farms are vanguards in food production. By doing so, the individual participant or a particular farm risks being identified, either correctly, or incorrectly, as being a source of virus evolution and spread to other farms. And yet, the nature and structure of the swine industry is much more responsible for pathogen movement than any individual farm. That is, weaned pigs are transported away from the sow farm to allow pathogens such as PRRSV to be more effectively eliminated from the sow herd. This means that growing pigs must be transported to nursery and finishing sites and by doing so, pathogens are also conveniently moved around the country. Secondly, health is difficult to maintain if growing pig sites become too large. Therefore, we have a distributed system of growing pig sites which also lends to pathogens being moved around the country. Finally, farms might be using live virus vaccination in the short term to reduce clinical impact and aid in the elimination of field virus in the long term. So it is a bold decision to share data for a study such as this. There is a greater good being pursued by these industry leaders. They voluntarily share their premises identities and pathogen status in the interests of national disease control such that we might detect emerging pathogens earlier than otherwise and take actions accordingly. Work such as reported in this paper is on the cusp of a new era of disease control.
Considerations and Future Applications of the Method
The methodological approach presented here entailed several compromises, including: (1) imprecise epidemiological information related to the discrete traits investigated, and (2) incomplete and biased sampling of PRRSV ORF5 sequences. For the first, we demonstrate the impact of the accuracy and availability of epidemiological information on the MCC trees (Figures 3, 5) and their posterior inferences. This impact on the performance of phylodynamic models has been discussed elsewhere (Chaikhumwang et al., 2015). However, this issue is chronic in the context of surveillance data and almost impossible to avoid in practical reality of animal disease surveillance (Perez et al., 2011). Therefore, rigorous analysis of a selected Bayesian phylodynamic model (i.e., assessing fit and uncertainty) is essential before deriving conclusions from their posterior inferences. For the second, inferences under the phylodynamic models assume that we have either a complete or random sample of sequence data. In the present case, this requires that the PRRSV sequences were collected randomly with respect to time (between 1999 and 2015) with their corresponding epidemiological information. Like most phylogenetic studies, our data were from a convenience sample and might suffer from strongly biased samples. The impact of these departures from random sampling on the estimates is difficult to quantify (Alkhamis et al., 2015). However, our study is based on all available sequence data from our participants for the ORF5 gene associated with the currently circulating RFLP type 1-7-4 epidemic in the US, and therefore reflects our best understanding based of the available data. It is worth noting that despite the unequal number of sequences obtained from different systems (Table S1), our posterior inferences for dispersal of the virus between system was not biased toward systems with more included sequences, such as E (n = 120) and D (n = 55), when compared to C (n = 52). This constitutes an example for the utility and robustness of such methods in the context of molecular surveillance of swine diseases.
Bayesian phylodynamic models have not yet been widely accepted as a resource by veterinary agencies to support disease surveillance, control and prevention strategies. This is attributed by part to the intensive computational requirements of the methods presented here. For example, we were unable to assess the topology of 6774 sequences using BEAST due to the lack of sufficient computational resources. Instead, we used the traditional ML method to help in identifying the key cluster of interest, while reducing the computational requirements of the Bayesian analyses used to address our main hypotheses. That said, computational resources are in continuous improvement in terms of speed and cost, and therefore, in the near future the presented analytical pipeline can be completely transformed to Bayesian statistical framework. However, previous use of such methods on avian influenza and the Ebola epidemics demonstrated the ability of phylodynamic methods to shed novel insights into the evolutionary epidemiology of infectious diseases and provide support for decisions regarding animal and public health (Lam et al., 2012; Pybus et al., 2013; Alizon et al., 2014). Our phylodynamic analyses of a PRRSV ORF5 sequence dataset and associated epidemiological information, in an endemic country like the U.S., were in agreement with previous inferences about the demographic histories and population growth patterns of viral lineages and sub-lineages of the virus in the U.S. (Shi et al., 2010). Bayesian phylodynamic models show one remarkable improvement compared to traditional methods, namely, they make use of associated epidemiological information, such as time and place of isolation, to infer genetic relations. The inclusion of information on nucleotide substitution schemes obtained from the data, allowing for different model assumptions to assess the degree of genetic relatedness under time-scaled phylogenies, has provided a robust strategy, for example, to distinguish between potentially related PRRSV strains detected in air samples and swine farms in high and low swine density regions (Brito et al., 2014). In the analysis here, we incorporated time of prior isolation to reconstruct the phylogenetic dendogram, hence, making use of temporal distances to infer genetic relations. This approach can help to shed further light on several evolutionary and epidemiological characters of endemic PRRSV. Furthermore, extended phylodynamic models can provide insights on the ancestral origins of the outbreak between swine systems (e.g., the ancestral system or herd type) and spatio-temporal progression of an epidemic. These inferences could be used, for example, to identify viral dispersion routes that correspond with transportation patterns involving high PRRSV risk.
Classical phylogenetic methods such as neighbor-joining or maximum likelihood trees, provide limited inferences about the evolution of important pathogens and ignore important evolutionary parameters and uncertainties, which in turn limits decision making related to surveillance, control, and prevention resources. However, in this study, we illustrated the applications and potential of phylodynamic methods as tools for molecular surveillance of food animal viruses by assessing the evolution of newly emerging PRRSVs in the U.S. We analyzed different epidemiological and evolutionary aspects of a recently collected ORF5 gene sequence dataset. Using coalescence and discrete trait phylodynamic models, we obtained a phylogeny adjusted for many important epidemiological parameters such as space, time, and host type. Furthermore, we were able to (1) infer population growth and demographic history of the virus, which aids in assessing the magnitude of epidemic progression; (2) identified the most likely ancestral system, which aids in guiding risk-based surveillance activities; and (3) modeled viral transmission patterns between systems and farm types, which sheds important insights about viral transmission dynamics between and within swine herds. Accordingly, incorporating phylodynamic analyses as a standard tool for the molecular surveillance of swine diseases might support the development of more effective economically rational policy decisions for the control of PRRSV in high-risk systems. However, investments must be mobilized toward improving genomic databases and building efficient bioinformatics and computational infrastructures, which are the base requirements for the field of applied phylodynamics (Scotch et al., 2011; Scotch and Mei, 2013).
MA formulated the Bayesian models and was primarily responsible for report and manuscript preparation; AP provided interpretation on the use of epidemiological models, collaborated in the design of the analytical model, and assisted in manuscript preparation; MM contributed with the interpretation of results related with PRRSV genetic dynamics and manuscript preparation and editing; XW helped with data preparation and management; RM conceived the study, was responsible for communication with the industry and supervision of the entire project, provided insight on the implementation of results at the field level, and assisted in manuscript preparation.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This study was funded in part by the University of Minnesota MnDrive and College of Veterinary Medicine Population Systems grant programs, by the National Pork Board, and by Boehringer Ingelheim Vetmedica, Inc. We also acknowledge the five anonymous systems and their herd veterinarians who graciously provided sequences and related data that made the study possible.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/article/10.3389/fmicb.2016.00067
Alizon, S., Lion, S., Murall, C. L., and Abbate, J. L. (2014). Quantifying the epidemic spread of Ebola virus (EBOV) in Sierra Leone using phylodynamics. Virulence 5, 825–827. doi: 10.4161/21505594.2014.976514
Alkhamis, M. A., Moore, B. R., and Perez, A. M. (2015). Phylodynamics of H5N1 highly pathogenic avian influenza in Europe, 2005–2010: potential for molecular surveillance of new outbreaks. Viruses 7, 3310–3328. doi: 10.3390/v7062773
Alonso, C., Murtaugh, M. P., Dee, S. A., and Davies, P. R. (2013). Epidemiological study of air filtration systems for preventing PRRSV infection in large sow herds. Prev. Vet. Med. 112, 109–117. doi: 10.1016/j.prevetmed.2013.06.001
Benfield, D. A., Nelson, E., Collins, J. E., Harris, L., Goyal, S. M., Robison, D., et al. (1992). Characterization of swine infertility and respiratory syndrome (SIRS) virus (isolate ATCC VR-2332). J. Vet. Diagn. Invest. 4, 127–133. doi: 10.1177/104063879200400202
Brar, M. S., Shi, M., Murtaugh, M. P., and Leung, F. C. (2015). Evolutionary diversification of type 2 porcine reproductive and respiratory syndrome virus. J. Gen. Virol. 96, 1570–1580. doi: 10.1099/vir.0.000104
Brito, B., Dee, S., Wayne, S., Alvarez, J., and Perez, A. (2014). Genetic diversity of PRRS virus collected from air samples in four different regions of concentrated swine production during a high incidence season. Viruses 6, 4424–4436. doi: 10.3390/v6114424
Chaikhumwang, P., Tantituvanont, A., Tripipat, T., Tipsombatboon, P., Piriyapongsa, J., and Nilubol, D. (2015). Dynamics and evolution of highly pathogenic porcine reproductive and respiratory syndrome virus following its introduction into a herd concurrently infected with both types 1 and 2. Infect. Genet. Evol. 30, 164–174. doi: 10.1016/j.meegid.2014.12.025
Chen, J. Z., Peng, J. M., Bai, Y., Wang, Q., Liu, Y. M., Zhang, Q. Y., et al. (2015). Characterization of two novel porcine reproductive and respiratory syndrome virus isolates with deletions in the GP2 gene. Vet. Microbiol. 176, 344–351. doi: 10.1016/j.vetmic.2015.01.018
Corzo, C. A., Mondaca, E., Wayne, S., Torremorell, M., Dee, S., Davies, P., et al. (2010). Control and elimination of porcine reproductive and respiratory syndrome virus. Virus Res. 154, 185–192. doi: 10.1016/j.virusres.2010.08.016
Dea, S., Gagnon, C. A., Mardassi, H., Pirzadeh, B., and Rogan, D. (2000). Current knowledge on the structural proteins of porcine reproductive and respiratory syndrome (PRRS) virus: comparison of the North American and European isolates. Arch. Virol. 145, 659–688. doi: 10.1007/s007050050662
Drummond, A. J., Rambaut, A., Shapiro, B., and Pybus, O. G. (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192. doi: 10.1093/molbev/msi103
Goldberg, T. L., Hahn, E. C., Weigel, R. M., and Scherba, G. (2000). Genetic, geographical and temporal variation of porcine reproductive and respiratory syndrome virus in Illinois. J. Gen. Virol. 81, 171–179. doi: 10.1099/0022-1317-81-1-171
Grenfell, B. T., Pybus, O. G., Gog, J. R., Wood, J. L., Daly, J. M., Mumford, J. A., et al. (2004). Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303, 327–332. doi: 10.1126/science.1090727
Holtkamp, D. J., Kliebenstein, J. B., Neumann, E. J., Zimmerman, J. J., Rotto, H. F., Yoder, T. K., et al. (2013). Assessment of the economic impact of porcine reproductive and respiratory syndrome virus on United States pork producers. J. Swine Health Prod. 21, 72–84. Available online at: https://www.aasv.org/shap/issues/v21n2/v21n2p72.pdf
Jeong, J., Aly, S. S., Cano, J. P., Polson, D., Kass, P. H., and Perez, A. M. (2014). Stochastic model of porcine reproductive and respiratory syndrome virus control strategies on a swine farm in the United States. Am. J. Vet. Res. 75, 260–267. doi: 10.2460/ajvr.75.3.260
Kapur, V., Elam, M. R., Pawlovich, T. M., and Murtaugh, M. P. (1996). Genetic variation in porcine reproductive and respiratory syndrome virus isolates in the midwestern United States. J. Gen. Virol. 77(Pt 6), 1271–1276. doi: 10.1099/0022-1317-77-6-1271
Kwong, G. P., Poljak, Z., Deardon, R., and Dewey, C. E. (2013). Bayesian analysis of risk factors for infection with a genotype of porcine reproductive and respiratory syndrome virus in Ontario swine herds using monitoring data. Prev. Vet. Med. 110, 405–417. doi: 10.1016/j.prevetmed.2013.01.004
Lam, T. T., Hon, C. C., Lemey, P., Pybus, O. G., Shi, M., Tun, H. M., et al. (2012). Phylodynamics of H5N1 avian influenza virus in Indonesia. Mol. Ecol. 21, 3062–3077. doi: 10.1111/j.1365-294X.2012.05577.x
Lanfear, R., Calcott, B., Ho, S. Y., and Guindon, S. (2012). Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701. doi: 10.1093/molbev/mss020
Larochelle, R., D'Allaire, S., and Magar, R. (2003). Molecular epidemiology of porcine reproductive and respiratory syndrome virus (PRRSV) in Quebec. Virus Res. 96, 3–14. doi: 10.1016/S0168-1702(03)00168-0
Maddison, W. P., and Maddison, D. R. (2011). Mesquite Version 2.75: A Modular System for Evolutionary Analysis. Available online at: http://mesquiteproject.org (Accessed February 20, 2014).
Martin, D. P., Lemey, P., Lott, M., Moulton, V., Posada, D., and Lefeuvre, P. (2010). RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26, 2462–2463. doi: 10.1093/bioinformatics/btq467
Martín-Valls, G. E., Kvisgaard, L. K., Tello, M., Darwich, L., Cortey, M., Burgara-Estrella, A. J., et al. (2014). Analysis of ORF5 and full-length genome sequences of porcine reproductive and respiratory syndrome virus isolates of genotypes 1 and 2 retrieved worldwide provides evidence that recombination is a common phenomenon and may produce mosaic isolates. J. Virol. 88, 3170–3181. doi: 10.1128/JVI.02858-13
Minin, V. N., Bloomquist, E. W., and Suchard, M. A. (2008). Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25, 1459–1471. doi: 10.1093/molbev/msn090
Murtaugh, M. P., Stadejek, T., Abrahante, J. E., Lam, T. T., and Leung, F. C. (2010). The ever-expanding diversity of porcine reproductive and respiratory syndrome virus. Virus Res. 154, 18–30. doi: 10.1016/j.virusres.2010.08.015
Neumann, E. J., Kliebenstein, J. B., Johnson, C. D., Mabry, J. W., Bush, E. J., Seitzinger, A. H., et al. (2005). Assessment of the economic impact of porcine reproductive and respiratory syndrome on swine production in the United States. J. Am. Vet. Med. Assoc. 227, 385–392. doi: 10.2460/javma.2005.227.385
Nguyen, V. G., Kim, H. K., Moon, H. J., Park, S. J., Chung, H. C., Choi, M. K., et al. (2014). A Bayesian phylogeographical analysis of type 1 porcine reproductive and respiratory syndrome virus (PRRSV). Transbound. Emerg. Dis. 61, 537–545. doi: 10.1111/tbed.12058
Perez, A., Alkhamis, M., Carlsson, U., Brito, B., Carrasco-Medanic, R., Whedbee, Z., et al. (2011). Global animal disease surveillance. Spat. Spatiotemporal Epidemiol. 2, 135–145. doi: 10.1016/j.sste.2011.07.006
Perez, A. M., Davies, P. R., Goodell, C. K., Holtkamp, D. J., Mondaca-Fernández, E., Poljak, Z., et al. (2015). Lessons learned and knowledge gaps about the epidemiology and control of porcine reproductive and respiratory syndrome virus in North America. J. Am. Vet. Med. Assoc. 246, 1304–1317. doi: 10.2460/javma.246.12.1304
Pybus, O. G., Fraser, C., and Rambaut, A. (2013). Evolutionary epidemiology: preparing for an age of genomic plenty. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20120193. doi: 10.1098/rstb.2012.0193
Rambaut, A. (2012). FigTree v.1.4.2: Tree Figure Drawing Tool. Available online at: http://tree.bio.ed.ac.uk/software/figtree (Accessed August 11, 2014).
Rambaut, A., Suchard, M. A., Xie, D., and Drummond, A. J. (2014). Tracer v1.6. Available online at: http://beast.bio.ed.ac.uk/Tracer
Razavi, N. (2008). Kullback–Leibler Divergence. Available online at: http://www.mathworks.com/matlabcentral/fileexchange/20688-kullback-leibler-divergence (Accessed March 14, 2014).
Rosendal, T., Dewey, C., Friendship, R., Wootton, S., Young, B., and Poljak, Z. (2014). Association between PRRSV ORF5 genetic distance and differences in space, time, ownership and animal sources among commercial pig herds. Transbound. Emerg. Dis. doi: 10.1111/tbed.12253. [Epub ahead of print].
Rowland, R. R., and Morrison, R. B. (2012). Challenges and opportunities for the control and elimination of porcine reproductive and respiratory syndrome virus. Transbound Emerg. Dis. 59(Suppl. 1), 55–59. doi: 10.1111/j.1865-1682.2011.01306.x
Scotch, M., and Mei, C. (2013). Phylogeography of swine influenza H3N2 in the United States: translational public health for zoonotic disease surveillance. Infect. Genet. Evol. 13, 224–229. doi: 10.1016/j.meegid.2012.09.015
Scotch, M., Sarkar, I. N., Mei, C., Leaman, R., Cheung, K. H., Ortiz, P., et al. (2011). Enhancing phylogeography by improving geographical information from GenBank. J. Biomed Inform. 44(Suppl. 1), S44–S47. doi: 10.1016/j.jbi.2011.06.005
Shi, M., Lam, T. T., Hon, C. C., Murtaugh, M. P., Davies, P. R., Hui, R. K., et al. (2010). Phylogeny-based evolutionary, demographical, and geographical dissection of North American type 2 porcine reproductive and respiratory syndrome viruses. J. Virol. 84, 8700–8711. doi: 10.1128/JVI.02551-09
Shi, M., Lemey, P., Singh Brar, M., Suchard, M. A., Murtaugh, M. P., Carman, S., et al. (2013). The spread of type 2 Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) in North America: a phylogeographic approach. Virology 447, 146–154. doi: 10.1016/j.virol.2013.08.028
Suchard, M. A., Weiss, R. E., and Sinsheimer, J. S. (2001). Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18, 1001–1013. doi: 10.1093/oxfordjournals.molbev.a003872
Thanapongtharm, W., Linard, C., Pamaranon, N., Kawkalong, S., Noimoh, T., Chanachai, K., et al. (2014). Spatial epidemiology of porcine reproductive and respiratory syndrome in Thailand. BMC Vet. Res. 10:174. doi: 10.1186/s12917-014-0174-y
Tun, H. M., Shi, M., Wong, C. L., Ayudhya, S. N., Amonsin, A., Thanawonguwech, R., et al. (2011). Genetic diversity and multiple introductions of porcine reproductive and respiratory syndrome viruses in Thailand. Virol. J. 8, 164. doi: 10.1186/1743-422X-8-164
Wang, X., Marthaler, D., Rovira, A., Rossow, S., and Murtaugh, M. P. (2015). Emergence of a virulent porcine reproductive and respiratory syndrome virus in vaccinated herds in the United States. Virus Res. 210, 34–41. doi: 10.1016/j.virusres.2015.07.004
Yoon, S. H., Kim, H., Kim, J., Lee, H. K., Park, B., and Kim, H. (2013). Complete genome sequences of porcine reproductive and respiratory syndrome viruses: perspectives on their temporal and spatial dynamics. Mol. Biol. Rep. doi: 10.1007/s11033-013-2802-1. [Epub ahead of print].
Zimmerman, J., Yoon, K.-J., and Neumann, E. (2003). PRRS Compendium Producer Edition. Des Moines, IA: National Pork Board. Available online at: http://www.pork.org/production-topics/swine-health/2003-prrs-compendium-producer-edition/
Keywords: Bayesian phylodynamics, PRRSV, RFLP type 1-7-4, ORF5 gene, molecular surveillance
Citation: Alkhamis MA, Perez AM, Murtaugh MP, Wang X and Morrison RB (2016) Applications of Bayesian Phylodynamic Methods in a Recent U.S. Porcine Reproductive and Respiratory Syndrome Virus Outbreak. Front. Microbiol. 7:67. doi: 10.3389/fmicb.2016.00067
Received: 25 August 2015; Accepted: 14 January 2016;
Published: 02 February 2016.
Edited by:Jörg Linde, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knoell-Institute, Germany
Reviewed by:Hein Min Tun, University of Manitoba, Canada
Sebastian Mueller, University of Cambridge, UK
Copyright © 2016 Alkhamis, Perez, Murtaugh, Wang and Morrison. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mohammad A. Alkhamis, firstname.lastname@example.org