Event Abstract

A novel way to predict PRRS outbreaks in the swine industry using multiple spatio-temporal features and machine learning approaches

  • 1 Department of Electrical and Computer Engineering, University of California, Davis, United States
  • 2 Department of Computer Science, University of California, Davis, United States
  • 3 Center for Animal Disease Modeling and Surveillance, School of Veterinary Medicine, University of California, Davis, United States

Introduction The sustainability and success of the livestock industry rely on the maintenance of good livestock health, high productivity, and efficiency [1]. Therefore, the prevention, early detection, and control of diseases in farms which could be present both in endemic and epidemic form is key. The US is the world’s second largest pork producer and second largest meat exporter (The North American Meat Institute, 2019). The multi-site swine production system (i.e. pigs separation by pig type and age) allows for specialized housing and feed. However, this multi-site system requires frequent movement of animals between sites and leads to high pig density areas, which increase the risk of disease spread. Porcine Reproductive and Respiratory Syndrome (PRRS) is currently the most challenging and costly viral infectious disease in the US swine industry [2]. The PRRS virus has high variability due to virus mutation which challenges vaccine development and implementation [3]. The high cost of testing and vaccination as well as the financial damage after an outbreak highlights the need to develop predictive models that can help to identify farms at high risk of infection to support risk-based, more cost-effective, target interventions. Such models will allow for more efficient testing, vaccination and outbreak prevention. Currently, control of PRRS relies on a combination of biosecurity, surveillance (i.e., testing), and vaccination. Testing in swine farms is conducted using serological and molecular tests that evaluate blood or oral fluids in live pigs or tissues in dead animals. Based on those testing activities the shedding (using PCR) and exposure (using ELISA) status of the herd can be determined and the herd can be classified in different categories [4]. For breeding herds (sow and nursery farms) there are four categories: (I) positive unstable, (II) positive stable, (III) provisional negative and (IV) negative. For growing herds (finishing herds) there are only positive or negative status. The challenge is that the untested farms have uncertain status and cannot easily be categorized into positive or negative. Some of the farm managers may decide to accept the risk of an outbreak rather than continuously test or implement strict biosecurity and vaccination protocols in their farms. The aim of this study was to examine different machine learning models and explore diverse features to more effectively predict and promptly detect PRRS outbreaks. We do this based on location, pig movements, pig production parameters, weather information, and testing/diagnostic data of the farm. This prediction can support more efficient testing and mitigation strategies to reduce PRRS impact in the swine industry Among the three major farm types (sow, nursery, and finishing). We focus here in finishing farms, which have the lowest frequency of testing and lowest standards of immunization and biosecurity and could highly benefit of a system that helps to predict outbreaks and, more importantly, will contribute to reducing the burden for disease transmission through airborne or other pathways to breeding herds. Thus, this work examines multiple machine learning models for outbreak prediction and early detection in finishing farms using a combination of diagnostics, production, and pig trade data. Data and Features To conduct this study, we use data from one large swine production system in the Midwest of the United States with multiple sow, nursey, and finishing farms. For the time period 2006-2019, a rich database from this system provide information on the movement of pigs between farms, the production of the farms, and PRRS testing results. A total of 3770 finishing farm production cycles (i.e. groups) out of which 620 were identified as positive some day during the production cycle are considered. Each cycle is a data point in the proposed model that predicts the health status of the farm. A farm is assumed to be negative if it meets two conditions: the production rate is in the top 10 percent (i.e. 90 percentile), and the percentage of exiting pigs with weight in standard range is in top 10 percent (i.e. 90 percentile). This results in identifying 5 percent of whole cycles, as negative. Disease transmission in a finisher farm was assumed to occur in two main ways. First is through the reception of pigs from other farms. This reception of animals normally happens in a few days at the beginning of each cycle. The second is through airborne transmission from nearby farms [5]. To model the first pathway, we consider different risk factors, including the total number of times that pigs enter a farm during a cycle and the total number of different sources that pigs are coming from. Most of the pigs clear the infection after getting infected, but some become persistently infected and can potentially spread the virus if transferred to another farm. For that, a feature was created representing the number of the entering pigs that are coming from a farm that had an outbreak during the lifetime of the pig in a previous cycle. To model the second pathway (airborne transmission), we defined a vicinity of the farms as the circular area around a farm within a defined distance. For each farm, the numbers of movements and head of the pigs entering or exiting the vicinity are calculated. Any movement feature has two versions: from the start of the production cycle up to its prediction date, and the historical equivalent of this feature for the one year prior to cycle start date. A farm surrounded with more nearby farms is at more risk of an outbreak. More importantly, the number of outbreaks happening in the neighborhood of that farm during the cycle and the historical number of the outbreaks in the last year is a representation of how risky that neighborhood is. The obtained production data including total consumed feed and weight are known at the end of the cycle and cannot be used for the purpose of prediction of an outbreak for the same cycle. However, they can be used for the evaluation of future cycles as it may be a good indicator of the overall performance/management practices/risk of a farm. Thus, historical production data for each farm is built. Moreover, from the location information of the farms we obtained daily weather data from the nearest weather station from the farm. The weather data includes temperature, wind speed, and relative humidity. Also, the quarters of the cycle were considered to capture seasonality effects. In addition, the percentage of the time that a farm has had an outbreak in the past is a mixed indicator of all above mentioned historical factors. Modeling Approach Various machine learning algorithms including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting (GB), Random Forest (RF), and Neural Network (NN) were trained to predict the status of the farm. To obtain the results, first the data was sorted chronologically based on the prediction date. Then, the first 50 percent of the data was used for model training, the next 25 percent for hyper parameter tuning of the model, and the remaining 25 percent as testing data. For the RF and GB algorithms, the train/test procedure was performed 100 times for the best set of model parameters and the average performance was reported. Results The output prediction of these models is the probability of each farm being sick. For a given threshold a farm is identified as positive by the model if its probability is higher than the threshold. Thus, metrics such as accuracy, sensitivity and specificity are dependent on this threshold. The Receiver operating Characteristic (ROC) curve shows how true positive rate (sensitivity) is changed against false positive rate (1- specificity) for different thresholds. The Area Under Curve (AUC) of ROC is a good metric for comparing different models. The AUC for precision against recall (PR AUC) should be considered for a better comparison. Table 1 includes the performance comparison of the five machine learning models used in terms of sensitivity, accuracy, specificity, etc. The results support the high predictive capabilities of the models under rich feature sets. The relative feature importance score which indicates the contribution of a feature to predict farm status show that the movement in a vicinity in a current cycle, historical data regarding dead pigs, and the weight range of exiting pigs are important. Additionally, the wind, historical average daily feed, and the historical outbreak percentage are amongst the topmost important features. Conclusion To the best of our knowledge, this is one of the first attempts to apply multiple machine learning models for PRRS prediction. Also, such a rich multi-scale (pig group-, farm-, area- level data) feature set presented in this work has not yet been considered for farm health analysis in the swine industry. The integration of historical data together with current cycle data has been shown to impact the prediction accuracy, which is also novel in this work. We believe this approach could be useful not only to predict PRRS but to be adapted to many other swine diseases.


This project was partially funded by the NSF BIGDATA:IA Award #1838207. Authors would like to acknowledge swine industry collaborators for the provision of data.


1. Tomley, F., & Shirley, M. (2009). Livestock infectious diseases and zoonoses. Philosophical Transactions of the Royal Society B, 364(1530), 2637-2642. 2. Holtkamp, Derald J. J, et al. (2013). Assessment of the Economic Impact of Porcine Reproductive and Respiratory Syndrome Virus on United States Pork Producers. Journal of Swine Health and Production, 21, 72–84. 3. Mateu, and Diaz. (2008). The Challenge of PRRS Immunology. The Veterinary Journal, 177, 345–351. 4. D. J. Holtkamp et al., (2011). Terminology for classifying swine herds by porcine reproductive and respiratory syndrome virus status, Journal of Swine Health and Production, 19:13, 5. Otake, et al. (2010). Long-Distance Airborne Transport of Infectious PRRSV and Mycoplasma Hyopneumoniae from a Swine Population Infected with Multiple Viral Variants. Veterinary Microbiology, 145, 198–208.

Keywords: machine learning, spatio-temporal analysis., Swine Diseases, Multi-scale modeling, Disease Prediction

Conference: GeoVet 2019. Novel spatio-temporal approaches in the era of Big Data, Davis, United States, 8 Oct - 10 Oct, 2019.

Presentation Type: Student senior oral presentation

Topic: Spatio-temporal surveillance and modeling approaches

Citation: Shamsabardeh M, Rezaei S, Gomez J, Martínez-López B and Liu X (2019). A novel way to predict PRRS outbreaks in the swine industry using multiple spatio-temporal features and machine learning approaches. Front. Vet. Sci. Conference Abstract: GeoVet 2019. Novel spatio-temporal approaches in the era of Big Data. doi: 10.3389/conf.fvets.2019.05.00085

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 21 Jun 2019; Published Online: 27 Sep 2019.

* Correspondence:
Mx. M Shamsabardeh, Department of Electrical and Computer Engineering, University of California, Davis, Davis, United States, mshamsabardeh@ucdavis.edu
Prof. Beatriz Martínez-López, Center for Animal Disease Modeling and Surveillance, School of Veterinary Medicine, University of California, Davis, Davis, California, CA 95616-5270, United States, beamartinezlopez@ucdavis.edu
Prof. Xin Liu, Department of Computer Science, University of California, Davis, Davis, United States, xinliu@ucdavis.edu