# ANIMAL GENETICS AND DISEASES: ADVANCES IN FARMING AND LIVESTOCK SYSTEMS

EDITED BY : Mark S. Fife, John A. Hammond and Andrea B. Doeschl-Wilson PUBLISHED IN : Frontiers in Genetics

#### Frontiers eBook Copyright Statement

The copyright in the text of individual articles in this eBook is the property of their respective authors or their respective institutions or funders. The copyright in graphics and images within each article may be subject to copyright of other parties. In both cases this is subject to a license granted to Frontiers. The compilation of articles constituting this eBook is the property of Frontiers.

Each article within this eBook, and the eBook itself, are published under the most recent version of the Creative Commons CC-BY licence. The version current at the date of publication of this eBook is CC-BY 4.0. If the CC-BY licence is updated, the licence granted by Frontiers is automatically updated to the new version.

When exercising any right under the CC-BY licence, Frontiers must be attributed as the original publisher of the article or eBook, as applicable.

Authors have the responsibility of ensuring that any graphics or other materials which are the property of others may be included in the CC-BY licence, but this should be checked before relying on the CC-BY licence to reproduce those materials. Any copyright notices relating to those materials must be complied with.

Copyright and source acknowledgement notices may not be removed and must be displayed in any copy, derivative work or partial copy which includes the elements in question.

All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For further information please read Frontiers' Conditions for Website Use and Copyright Statement, and the applicable CC-BY licence.

ISSN 1664-8714 ISBN 978-2-88963-523-8 DOI 10.3389/978-2-88963-523-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# ANIMAL GENETICS AND DISEASES: ADVANCES IN FARMING AND LIVESTOCK SYSTEMS

Topic Editors: Mark S. Fife, AVIAGEN UK Ltd, United Kingdom John A. Hammond, Pirbright Institute, United Kingdom Andrea B. Doeschl-Wilson, University of Edinburgh, United Kingdom

The new Animal Genetics and Disease 2017 conference committee organized a Research Topic for the proceedings of this inaugural conference. The meeting brought together specialists working on the interface between genomics, genetic engineering, and infectious disease, with the aims of improving animal and human health and welfare. This conference was funded by Advanced Courses and Scientific Conference at the Wellcome Genome Campus, Hinxton, UK.

The conference will highlight breakthroughs in genomic technologies that are rapidly increasing our understanding of the fundamental role that host and pathogen genetics play in infections and epidemics. This Research Topic focuses on how infections spread and how they further affect the productivity of livestock systems and food supply chains. Thanks to technological advances, we now have the tools for real-time surveillance of zoonoses affecting wildlife, farm animals and animal-to-human disease transmission.

Citation: Fife, M. S., Hammond, J. A., Doeschl-Wilson, A. B., eds. (2020). Animal Genetics and Diseases: Advances in Farming and Livestock Systems. Lausanne: Frontiers Media SA. doi: 10.3389/978-2-88963-523-8

# Table of Contents


Kaylee Rowland, Anna Wolc, Rodrigo A. Gallardo, Terra Kelly, Huaijun Zhou, Jack C. M. Dekkers and Susan J. Lamont


Mélanie Gunia, Ingrid David, Jacques Hurtaud, Mickaël Maupin, Hélène Gilbert and Hervé Garreau

*92 The Genomic Architecture of Fowl Typhoid Resistance in Commercial Layers*

Androniki Psifidi, Kay M. Russell, Oswald Matika, Enrique Sánchez-Molano, Paul Wigley, Janet E. Fulton, Mark P. Stevens and Mark S. Fife

*103 Sequence Characterization of* DSG3 *Gene to Know its Role in High-Altitude Hypoxia Adaptation in the Chinese Cashmere Goat*

Chandar Kumar, Shen Song, Lin Jiang, Xiaohong He, Qianjun Zhao, Yabin Pu, Kanwar Kumar Malhi, Asghar Ali Kamboh and Yuehui Ma

*117 Dissecting the Genomic Architecture of Resistance to* Eimeria maxima *Parasitism in the Chicken*

Kay Boulton, Matthew J. Nolan, Zhiguang Wu, Valentina Riggio, Oswald Matika, Kimberley Harman, Paul M. Hocking, Nat Bumstead, Pat Hesketh, Andrew Archer, Stephen C. Bishop, Pete Kaiser, Fiona M. Tomley, David A. Hume, Adrian L. Smith, Damer P. Blake and Androniki Psifidi


Enrique Sánchez-Molano, Veysel Bay, Robert F. Smith, Georgios Oikonomou and Georgios Banos

# Transcriptional Innate Immune Response of the Developing Chicken Embryo to Newcastle Disease Virus Infection

Megan A. Schilling1,2,3, Robab Katani1,2,4, Sahar Memari<sup>5</sup> , Meredith Cavanaugh<sup>5</sup> , Joram Buza<sup>3</sup> , Jessica Radzio-Basu<sup>1</sup> , Fulgence N. Mpenda<sup>3</sup> , Melissa S. Deist<sup>6</sup> , Susan J. Lamont<sup>6</sup> and Vivek Kapur1,2,3 \*

<sup>1</sup> Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, United States, <sup>2</sup> Department of Animal Science, Pennsylvania State University, University Park, PA, United States, <sup>3</sup> School of Life Sciences and Bio-Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania, <sup>4</sup> Applied Biological Research Laboratory, Pennsylvania State University, University Park, PA, United States, <sup>5</sup> Department of Biology, Pennsylvania State University, University Park, PA, United States, <sup>6</sup> Department of Animal Science, Iowa State University, Ames, IA, United States

#### Edited by:

Mark S. Fife, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Jacqueline Smith, The University of Edinburgh, United Kingdom Androniki Psifidi, Royal Veterinary College, United Kingdom

> \*Correspondence: Vivek Kapur vkapur@psu.edu

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 27 September 2017 Accepted: 09 February 2018 Published: 27 February 2018

#### Citation:

Schilling MA, Katani R, Memari S, Cavanaugh M, Buza J, Radzio-Basu J, Mpenda FN, Deist MS, Lamont SJ and Kapur V (2018) Transcriptional Innate Immune Response of the Developing Chicken Embryo to Newcastle Disease Virus Infection. Front. Genet. 9:61. doi: 10.3389/fgene.2018.00061 Traditional approaches to assess the immune response of chickens to infection are through animal trials, which are expensive, require enhanced biosecurity, compromise welfare, and are frequently influenced by confounding variables. Since the chicken embryo becomes immunocompetent prior to hatch, we here characterized the transcriptional response of selected innate immune genes to Newcastle disease virus (NDV) infection in chicken embryos at days 10, 14, and 18 of embryonic development. The results suggest that the innate immune response 72 h after challenge of 18-day chicken embryo is both consistent and robust. The expression of CCL5, Mx1, and TLR3 in lung tissues of NDV challenged chicken embryos from the outbred Kuroiler and Tanzanian local ecotype lines showed that their expression was several orders of magnitude higher in the Kuroiler than in the local ecotypes. Next, the expression patterns of three additional innate-immunity related genes, IL-8, IRF-1, and STAT1, were examined in the highly congenic Fayoumi (M5.1 and M15.2) and Leghorn (Ghs6 and Ghs13) sublines that differ only at the microchromosome bearing the major histocompatibility locus. The results show that the Ghs13 Leghorn subline had a consistently higher expression of all genes except IL-8 and expression seemed to be subline-dependent rather than breed-dependent, suggesting that the innate immune response of chicken embryos to NDV infection may be genetically controlled by the MHC-locus. Taken together, the results suggest that the chicken embryo may represent a promising model to studying the patterns and sources of variation of the avian innate immune response to infection with NDV and related pathogens.

Keywords: backyard poultry, chicken embryo, Newcastle disease virus, innate immune response, transcriptional response

## INTRODUCTION

fgene-09-00061 February 24, 2018 Time: 13:47 # 2

Newcastle disease virus (NDV) is one of the most important poultry pathogens worldwide, with over eighty countries in North and South America, Europe, Asia, and Africa reporting outbreaks each year (Diel et al., 2012). NDV infections manifest through a wide range of strain dependent symptoms including those within the respiratory system (coughing, sneezing, and wheezing), the nervous system (such as twisted neck, tremors, and paralysis), and the reproductive system (loss in egg production). Mortality rates may reach as high as 100% in unvaccinated flocks (Ashraf and Shah, 2014). Unsurprisingly then, NDV infections are responsible for considerable economic losses to poultry production in both developed and developing countries. For instance, the 2002–2003 NDV outbreak in California resulted in the destruction of 3 million birds and financial losses of over \$160 million (Diel et al., 2012).

Differing levels of susceptibility to NDV have been reported among commercial breeds, as well as between commercial poultry and local ecotypes; however, the mechanisms contributing to these differences remain unknown (Zhou and Lamont, 1999; Minga et al., 2004; Deist et al., 2017b). For instance, studies report that the highly inbred Fayoumi lines are less susceptible to NDV infection (and other infections) than single comb, highly inbred White Leghorn chickens (Zhou and Lamont, 1999). Similarly, backyard chickens are generally considered less susceptible to NDV and other infections than commercial chickens that are bred for high productivity. This reduced susceptibility is presumably due to their pre-sensitization with a higher frequency of natural exposure to pathogens in the scavenging environment, as well as through natural selection for hardiness to higher levels of pathogen exposure. However, considerable variation in response to NDV infection has been noted both between and within ecotypes commonly found in backyard settings (Minga et al., 2004). Another breed of chickens hypothesized to be less susceptible to disease is the Kuroiler, a dual-purpose chicken, first bred for improving both meat and egg production of backyard poultry in India. The Kuroiler, which has multi-colored feathers to help with camouflage in the wild, and thrives in backyard or scavenging environments has recently been introduced in East Africa (including Tanzania) with reports suggesting that it can coexist with, and out-produce the local chickens in Uganda and Kenya (Dessie and Getachew, 2016).

Although differences in susceptibility are observed between these breeds, the underlying immune mechanisms contributing to these differences remain unknown. With the widespread use of NDV vaccines, studies of immunity have focused on antibody production and cell-mediated responses more so than innate immunity (Ahmed et al., 2007; Kapczynski et al., 2013). One study demonstrated a rapid and robust innate response shortly after virulent NDV infection using a microarray analysis of chicken spleen tissue (Rue et al., 2011). However, the level of susceptibility to NDV was not examined in the study, hence, whether the innate immune response plays a role in enhancing immunity to NDV in poultry, and particularly in backyard poultry, is presently not understood.

Current techniques to evaluate the immune response and disease susceptibility in chickens through challenge of live birds are expensive and difficult to interpret due to confounding factors including age, natural exposure to infectious agents, the normal microflora, variability in dosing animals through natural routes of exposure, nutritional status, as well as exposure to other environmental stressors. In contrast, the chicken embryo, enclosed in the protective environment of the shell, is not only considerably cheaper, but also may be influenced by confounding environmental factors that may impact the results of challenge studies in hatched chicks.

It has been well described that the developing chicken embryo is able to produce an immune response to a pathogen prior to hatch, a feature that is widely exploited for modern largescale poultry production with the routine administration of in ovo vaccination for multiple pathogens, including Marek's Disease (MD) and Infectious Bursal Disease (IBD) (Sharma and Burmester, 1984; Stone et al., 1997; Seal et al., 2000). Embryonic development occurs over 21 days and by the 10th day, the first signs of the immune system are observed. On days 11 and 12, T cells and B cells are developed, respectively, with B cell differentiation occurring after the 15th day of development. By the 18th day of embryonic development, the chicken embryo is immunocompetent and is capable of producing both an innate and adaptive response to pathogen (Davison, 2003; Ribatti, 2010; Mississippi State University Extension, 2017). Importantly, immune responses have been shown to be comparable in birds vaccinated in ovo or at later time points post-hatch (Sharma and Burmester, 1984; Stone et al., 1997; Gagic et al., 1999; Sharma, 1999; Steel et al., 2008). For example, vaccination with an inactivated oil emulsion NDV and Avian Influenza Virus (AIV) vaccine on the 18th day of embryonic development resulted in high seroconversion rates and antibody titers post hatch (Stone et al., 1997). An MD in ovo vaccine was also able to generate a four times greater level of protection than post hatch vaccination (Sharma and Burmester, 1984). Even though in ovo vaccination has now been applied for several decades, the mechanisms of induction of protective immunity in the chicken embryo remain poorly understood.

To begin to address this knowledge gap, we here transcriptionally profiled the innate immune response of the chicken embryo to NDV infection in both highly inbred and outbred lines. Our studies begin to demonstrate the use of the chicken embryo as a tool to examine the immune response to NDV since signatures of a consistent, breed-dependent innate immune response post NDV infection are present.

## MATERIALS AND METHODS

#### Ethics Statement and Animal Use

Animal use protocols were approved by the Pennsylvania State University IACUC committee (protocol numbers 46395 and 47175). Specific pathogen free (SPF) eggs from White Leghorn chickens were sourced through Charles River Laboratories International, Inc. (North Franklin, CT, United States). Tanzanian local ecotype and Kuroiler hatching eggs were sourced

from Urio Cross and Pure Breeding LTD, a local farm, in Tanzania (Tengeru, Arusha, Tanzania). Embryonated eggs from two well-defined, inbred Leghorn sublines, Ghs6 and Ghs13, as well as two inbred Fayoumi sublines, M5.1 and M15.2, from Iowa State University Poultry Farm (Ames, IA, United States) were also included in this study. Eggs were incubated (37.5◦C, 55% humidity, rotating hourly) and only temporarily removed from incubation to candle for viability and perform inoculations with virus.

## Virus

The lentogenic LaSota strain of NDV was kindly provided by Dr. Siba Samal at the University of Maryland, College of Veterinary Medicine (College Park, MD, United States). Titration of the virus was performed with the final titer of undiluted viral suspension of 10<sup>7</sup> 50% egg infectious dose (EID50)/mL. The viral suspension was stored at −80◦C until further use.

## Embryonated Chicken Egg Infection with NDV and Chicken Embryo Tissue Harvest

Embryonated eggs were inoculated at days 10, 14, or 18 of embryonic development, respectively, with 0.1 mL of the viral suspension directly deposited into the allantoic fluid. Eggs were sealed with an adhesive glue and placed back in incubators until death or removal for tissue harvest. Controls (uninfected eggs) were treated similarly with inoculation of 0.1 mL of phosphate buffered saline (PBS). Eggs were candled daily once infected and embryo (including dead embryo at that time point) were randomly selected for tissue harvest 24, 48, and 72 h post infection (hpi). The embryo selected in the group infected at 18 days and harvested 72 hpi were as close to hatch as possible without allowing the chick to hatch. Prior to harvest, the eggs were placed at 4◦C for 3–4 h to avoid opening eggs with viable embryos, the harvested embryo was extracted and rinsed three times in 1X PBS before selected tissues (based on size of the embryo) were harvested (Supplementary Table S1). Tissues harvested at the 10-day infection included two sections, the body and head, as the chicken embryo was too small and underdeveloped to obtain individual tissue samples. After the 14-day infection, the spleen, heart, and liver were harvested, and after 18-day infection, the spleen, heart, liver, and lung tissues were harvested. In the experiment using the SPF White Leghorn eggs, tissues from three infected and three control embryo were harvested at each time point; for the Kuroiler and Local ecotype embryos, eight infected and eight control embryos were harvested; and for the Leghorn and Fayoumi sublines, six infected and six control were harvested in the Ghs6, Ghs13, and M5.1 and eight infected and six control embryos were harvested for M15.2. The tissues were immediately stored at −80◦C prior to further use.

## RNA Extraction

RNA from all tissues was extracted using the RNeasy Plus Kit (QIAGEN Inc., Germantown, MD, United States) following the recommended protocol once tissues were homogenized. Tissue homogenization was performed using a Mini-Beadbeater-96 (Biospec Products, Bartlesville, OK, United States) for 1 min. Each tube placed in the beat beater contained 100 mg of respective tissue, 600 µL of RLT Lysis buffer (provided in the RNeasy kit), and 10–15 1.5 mm silica beads (Biospec Products, Bartlesville, OK, United States). After homogenization, 600 µL of lysate was added to a clean microcentrifuge tube and centrifuged at 10,000 rpm for 3 min to remove any debris. The supernatant was transferred to the gDNA eliminator column (provided in the RNeasy kit) and the recommended protocol was followed from this point forward.

#### cDNA Synthesis

cDNA synthesis was performed immediately after RNA extraction using the RT<sup>2</sup> First Strand Kit (QIAGEN Inc., Germantown, MD, United States) for the samples used with the RT<sup>2</sup> Profiler Array (one control and one infected embryo at 18 days and harvested 72 h post infection) and the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Carlsbad, CA, United States) for all other samples at all other time points. Manufacturer protocols were followed using 2 µg of each respective RNA sample. cDNA was stored at −20◦C.

#### Transcriptional Profiling of Innate and Adaptive Immune Responses

The RT<sup>2</sup> Profiler Array Chicken Innate and Adaptive Immune Responses (QIAGEN Inc., Germantown, MD, United States) was used as an initial screen of the immune response of embryo to NDV infection. The array contains a panel of 84 immune-related genes from the innate (Pattern Recognition Receptors, Cytokines, Other), adaptive (Th1, Th2, Th17, and Treg markers, T Cell Activation, Cytokines, Other), humoral, and inflammatory responses, as well as the defense response to bacteria and virus (QIAGEN, 2014). All recommended kits and procedures were followed (QIAGEN Inc., Germantown, MD, United States). A master mix was prepared using the RT<sup>2</sup> qPCR Master Mix and cDNA according to manufacturer protocols and added to each well of the 96 well plate, provided by QIAGEN Inc. qPCR was then performed using an Applied Biosystems StepOne Plus real-time PCR instrument per recommended protocols.

At the Pennsylvania State University, the PowerUp SYBR Green Master Mix (Applied Biosystems, Carlsbad, CA, United States) with the StepOne Plus System (Applied Biosystems, Carlsbad, CA, United States) was used to analyze the SPF White Leghorn, inbred Leghorn, and inbred Fayoumi samples. At the Nelson Mandela African Institution of Science and Technology, the QuantiNova SYBR Green RT-PCR Kit (QIAGEN Inc., Germantown, MD, United States) with the QuantStudio 6 Flex Real-Time PCR System (Applied Biosystems, Carlsbad, CA, United States) was used to analyze the Kuroiler and local ecotype samples. The recommended manufacturer's protocols were followed for both kits. The differences in protocols were due to availability of reagents and machines at NM-AIST. The QuantiNova kit was recommended as most stable for shipment to Tanzania. The primers used for analysis can be found in Supplementary Table S2.

#### Data Analysis

fgene-09-00061 February 24, 2018 Time: 13:47 # 4

Gene expression was analyzed using the 11Ct method comparing infected 1Ct values (normalized with B-actin) with the average of negative 1Ct values (normalized with B-actin) for each gene studied. The figures were generated in R<sup>1</sup> using log2 fold change expression data obtained from the 11Ct method for gene expression analysis. Statistical analysis performed between the Kuroiler and local ecotypes was performed in R using the Student's t-test and between the Leghorn and Fayoumi sublines using the non-parametric pairwise t-test (Kruskal–Wallis test) in (R Core Team, 2016).

Gene network analysis was performed using the Ingenuity Pathway Analysis (IPA) software (QIAGEN Inc., Germantown, MD, United States<sup>2</sup> ) (QIAGEN, 2014). The data was uploaded to IPA and Core analysis was performed to generate the network in **Figure 2A**. From this the genes with the most gene–gene interactions and highest fold changes were selected (**Figure 2B**). The dataset was filtered for these six genes and the top diseases and biofunctions, in particular related to infectious disease and relevant to this study, were selected and the network in **Figure 2C** was generated. The same networking was performed for the four sublines (**Figure 4**).

## RESULTS

Studies were performed to define the point during embryonic development when the chicken embryo is capable of producing a robust and consistent immune response to NDV, as well as to identify specific immune genes with the greatest differential expression post infection. Developing chicken embryos of various ages were infected with the LaSota strain of NDV via the allantoic fluid and tissues were harvested 24, 48, and 72 h post infection (hpi) (Supplementary Table S1).

The comparative transcriptional profile of SPF White Leghorn chicken embryos (one control and one infected at 18 days, harvested 72 hpi) was determined using the Chicken Innate and Adaptive Immune Responses RT<sup>2</sup> Profiler Array (QIAGEN Inc., Germantown, MD, United States) as an initial screen to select for immune genes in the embryo that are differentially expressed during infection since studies have not been performed previously. The upregulated genes range from 1-fold (IRF6 gene) to 754-fold (Mx1 gene) increases, and the downregulated genes range from −1-fold (IL1R1 gene) to −11-fold (CCR4 gene) decreases (Supplementary Table S3). Three of the genes with the highest increase in expression were selected for a more comprehensive analysis of the immune response in the embryo, C-C Motif Chemokine Ligand 5 (CCL5), MX Dynamin Like GTPase 1 (Mx1), and Toll-like Receptor 3 (TLR3). These three genes are also known to play a role in the innate immune response, in particular the immune response to viral infection (QIAGEN, 2014). This data is publically available at https://doi. org/10.18113/D3H952.

Once the three genes above were selected for further studies, the other collected embryo tissues from the SPF White Leghorns (three infected embryos and three control embryos per time point) were examined for the expression of these genes throughout the immune development of the chicken embryo (Supplementary Figure S1 and Table S1). Expression of all three immune genes was increased in the infected lung tissues at 18 days of embryonic development 72 hpi with considerably lower variation in expression than other infection time points (Supplementary Figure S1). Additionally, the Mx1 gene had a greater fold increase in the lung tissues as compared to CCL5 and TLR3 (average fold changes of 254, 4.2, and 5.6, respectively). Infection of the chicken embryo at 18 days, in particular with harvest of tissues 72 hpi, was deemed the most suitable time point for examining the immune response of the chicken embryo to NDV infection.

After determining a suitable time point to study chicken embryo immune gene expression, the analysis was broadened to include the Kuroiler and local ecotypes in Tanzania. The same three genes, CCL5, Mx1, and TLR3, were examined in the lung tissue of the Kuroiler and local ecotype embryos infected at 18 days of embryonic development and harvested 72 hpi (eight infected embryos and eight control embryos per line). Striking differences in expression of immune genes were observed between the Kuroiler and local ecotypes. The Kuroiler consistently expressed CCL5, Mx1, and TLR3 several orders of magnitude greater (164-, 19,816-, and 21.8-fold increase, respectively) than the local ecotype embryos (4.2-, 4.7-, and 1.8-fold increase, respectively) (**Figure 1**; data for individual replicates<sup>3</sup> ).

Pathway analysis using the Ingenuity Pathway Analysis (IPA) software (QIAGEN Inc., Germantown, MD, United States<sup>2</sup> ) was performed using the RT<sup>2</sup> Profiler Array data (expression fold change cutoff = 2) to determine important pathways and networks in the response to NDV in the chicken embryo (QIAGEN, 2014). The top canonical pathways included the role of pattern recognition receptors in recognition of bacteria and virus, iNOS Signaling, communication between innate and adaptive immune cells, and Toll-like receptor signaling (each pathway – p-values < 0.0001). The top regulator effector network was the antiviral response. The identified associated network diseases and functions including infectious disease, the cell-mediated immune response, and organismal injury, were selected based on relevance to the immune response and merged to generate a network map of the gene–gene interactions from the RT<sup>2</sup> Profiler Array, removing genes that were not included in the array (**Figure 2A**). From this, an additional three genes, Interleukin-8 (IL-8), Interferon Regulatory Factor 1 (IRF1) and Signal Transducer and Activator of Transcription 1 (STAT1), were selected for future analysis since they were all highly increased in the infected chicken embryo as well as being interconnected in immune response networks (**Figure 2B**). Core analysis was performed with filtering for these genes to determine important networks of relevance to this study with NDV (**Figure 2C**). Remarkably,

<sup>1</sup>https://www.r-project.org/

<sup>2</sup>https://targetexplorer.ingenuity.com/

<sup>3</sup>https://doi.org/10.18113/D3H952

the diseases and disorders associated with the expression of these six genes include NDV infection, RNA virus infection and replication, Paramyxovirus replication, lung infection, Measles virus infection and replication (a paramyxovirus), and Influenza A virus infection (a RNA virus of the respiratory tract), and susceptibility (p-values < 0.0001 for all diseases and disorders listed) (QIAGEN, 2014). However, we note that even though IPA analysis has been previously applied to understand patterns of gene expression in chickens, the pathways have been validated primarily using human/mouse/rat data, and it is also recommended to use at least 200 genes for pathways analyses, and hence, the results should be independently replicated (Krämer et al., 2014).

Next, the immune response profiles of 18-day NDV-infected embryos of the well-defined, highly congenic Fayoumi (M5.1 and M15.2) and Leghorn (Ghs6 and Ghs13) sublines were characterized (six infected embryos and six control embryos per subline – Ghs6, Ghs13, M5.1; eight infected and six control for the M15.2 subline). The congenic Fayoumi and Leghorn lines are highly inbred commercial poultry with inbreeding coefficients of 0.99, differing within each line only at the microchromosome bearing the MHC (Zhou and Lamont, 1999; Deist et al., 2017b). The Fayoumi was described as less susceptible to infectious disease than the Leghorn (Zhou and Lamont, 1999; Cheeseman, 2007). One study showed the Fayoumi line (M15.2) to be less susceptible to NDV than the Leghorn line (Ghs6) based on viral load post-challenge in ocular secretions at 6 (but not 2) days post infection (Hermann et al., 2016; Deist et al., 2017b), and have also identified differentially expressed genes in the trachea and lung transcriptomes of these two highly inbred chicken lines (Deist et al., 2017a,b). The results of our current investigation show striking differential expression of select genes in a subline-dependent manner with Ghs13 having consistently higher expression of all genes except IL-8 (**Figure 3**; data for individual replicates<sup>4</sup> ). Importantly, the results show that the gene expression differences seem to be subline-dependent even more so than breed dependent.

Similar analysis was performed, as in **Figure 2C**, with the average fold expression differences between challenged

<sup>4</sup>https://doi.org/10.18113/D3H952

and control lung tissue in the inbred Leghorn and inbred Fayoumi sublines using IPA (QIAGEN Inc., Germantown, MD, United States<sup>5</sup> ). Interestingly, the high responders, Ghs13 and M5.2 (**Figures 4A,B**, respectively), mimic similar patterns as the RT<sup>2</sup> Profiler Array analysis (**Figure 3**). The low responders, Ghs6 and M15.2 (**Figures 4C,D**, respectively), differ especially due to the downregulation of one or more of these genes, STAT1, CCL5, and IRF1, was predicted to increase viral replication and infection in the Ghs6 and M15.2 sublines (QIAGEN, 2014).

#### DISCUSSION

Until now, most studies performed to examine the immune response of chickens and other avian species have been through expensive and laborious in vivo experiments or through in vitro cell culture experiments lacking host–pathogen interactions. The animal studies, requiring enhanced biosecurity facilities, are influenced by multiple confounding factors producing results that are sometimes difficult to interpret. Therefore, despite current work to examine the chicken immune response to NDV, a knowledge gap remains in the mechanisms of the immune response and the effects they have on susceptibility of chickens to infection. In the present study, we utilized the developing chicken embryo as a controlled and inexpensive approach to evaluate the innate immune mechanisms in response to NDV infection in chickens. Through these studies we have found evidence that the chicken embryo is capable of producing a robust signature of the immune response, in particular the innate immune response, to NDV infection, at least after day 10 of embryonic development. In particular, the suitability of the chicken embryo for immune investigation was confirmed through gene expression studies where variation in gene expression was greatly reduced at 18-day infection as compared to earlier infection time points (Supplementary Figure S1). Since the chicken embryo is immunocompetent and the respiratory system almost fully developed a few days prior to hatch, examining the response of the chicken embryo to respiratory viruses such as NDV may be possible without the introduction of other confounding variables post-hatch.

<sup>5</sup>https://targetexplorer.ingenuity.com/

FIGURE 2 | Pathway analysis of the RT<sup>2</sup> Profiler Array gene expression data using Ingenuity Pathway Analysis (IPA) software. The red colored circles represent relative increases in expression of that gene in the infected chicken embryo lung tissues versus control and green circles represent downregulation. Dashed lines indicate direct relationships and solid lines indicate indirect relationships. The blue lines represent inhibition, orange represents activation, and yellow represents inconsistent findings with downstream molecules. (A) Networks associated with diseases and functions of the differentially expressed genes from the profiler array (Infectious Diseases, Cell-mediated Immune Responses, Lymphoid Tissue Development, and Organismal Injury) were mapped together. (B) The six highly upregulated and interconnected genes (Mx1, CCL5, TLR3, STAT1, IL-8, and IRF1) from the array were selected for future analysis. (C) Filtering of the data to include only these six genes revealed diseases and functions relevant to NDV infection, including NDV infection, Paramyxovirus replication, RNA virus infection and replication, susceptibility to infection, lung infection, Measles virus infection and replication, and Influenza A virus infection (all p-values < 0.0001).

Using this more controlled approach to examine the chicken immune response, we were able to demonstrate that the innate immune response to NDV in the chicken embryo appeared to be breed- (**Figure 1**) and/or subline-dependent (**Figure 3**), with the possibility of having a relation to the MHC type demonstrated by differences in the inbred Fayoumi and inbred Leghorn sublines, which only differ in the MHC-bearing chromosome. If the gene expression is, in fact, MHC type dependent, this

gene expression data from each Fayoumi and Leghorn subline in Figure 3. The red colored circles represent relative increases in expression of that gene in the infected chicken embryo lung tissues versus control and green circles represent downregulation. Dashed lines indicate direct relationships and solid lines indicate indirect relationships. The blue lines represent inhibition, orange represents activation, and yellow represents inconsistent findings with downstream molecules. Ghs13 (A) and M5.1 (B) the highest responders have similar patterns in the network analysis, and Ghs6 (C) and M15.2 (D) show deviations from this network due to the downregulation of some genes.

could explain some of the large within breed variation in expression demonstrated by the Tanzanian local ecotypes and the Kuroiler. These differences in the transcriptional response to NDV infection also demonstrate a possible relation between the innate immune gene expression and level of susceptibility of a particular line, since previous reports demonstrate the Ghs6 Leghorn subline and M15.2 Fayoumi subline differ in the level of susceptibility to NDV (Deist et al., 2017b). This study also showed IL-8 and Mx1 to be differentially expressed between the two sublines studied (Deist et al., 2017b). A more recent investigation of differential gene expression in the lung transcriptome of these same chicken lines in response to NDV infection also, demonstrated an activation of the IL-8 pathway in the resistant Fayoumi line, M15.2 (Deist et al., 2017a). While these previous studies used RNA-Seq based approaches and longer time-points (days rather than hours) post-infection and performed these studies in hatched chicks and are hence not directly comparable, the similar patterns of expression of at least some of the loci observed in the chick embryo model are encouraging, and need to be confirmed in future investigations.

Multiple other studies have each discovered at least one of the six genes (Mx1, CCL5, TLR3, IL-8, IRF1, and STAT1) studied here with elevated expression levels in the chicken particularly in response to NDV, AIV, and IBV demonstrating the key role of these innate immune genes in the chicken's response to pathogen (Heidari et al., 2010; Rue et al., 2011; Matulova et al., 2013; Cheng et al., 2014; Kang et al., 2016; Ranaware et al., 2016; Deist et al., 2017b). Although the network analysis was only performed with small numbers of genes, relevant pathways to our study were generated. Most surprising were the pathways, top diseases, and disorders associated with the genes including NDV infection, paramyxovirus replication, and lung infection since

they are closely associated with the study of NDV infection in the chicken embryo lung directly. Another significant function of relevance to this study was susceptibility to infection. Since determining the feasibility of using the chicken embryo as a new approach to explore the immune response and susceptibility to NDV infection is an ultimate goal of our project, revealing a significant involvement of STAT1 and TLR3 in this response is promising for future studies. Other diseases and disorders include RNA virus infection and replication (NDV is an RNA virus), Measles virus infection and replication (a paramyxovirus, like NDV), and Influenza A virus infection (an RNA virus of the respiratory tract) (QIAGEN, 2014). In most cases, TLR3, STAT1, and IRF1 are involved in signaling pathways that lead to stimulation of the end targets, CCL5, Mx1, and IL-8, which then go on to stimulate other immune related cells or pathways (QIAGEN, 2014). Multiple studies have recognized the Type-I Interferon (IFN) inducible response, protecting cells against invading viral pathogens, as important in the innate immune response in chickens, and the six genes examined in this study are involved in that response (Xing et al., 2008; Schoggins et al., 2011; Ranaware et al., 2016). The six genes studied here have been shown to be important in the chicken immune response to viral pathogens, whether it is NDV, AIV, or IBV (Ko et al., 2002; Benfield et al., 2008; Sartika et al., 2011; Cong et al., 2013; Barjesteh et al., 2014; Cheng et al., 2014; Dou et al., 2014; Fulton et al., 2014; Ranaware et al., 2016; Deist et al., 2017a,b), which demonstrates how the chicken embryo immune response starts to mimic that of hatched chickens, validating the use of the chicken embryo for future studies of the immune response to avian pathogens.

The innate immune response is a complex and interconnected network that is dependent on many factors. Although the six genes examine in this study are involved in the innate immune response and there are direct interactions between these genes, further validation of the response, including a larger set of innate immune genes, is needed in order to obtain a better understanding of the innate immune mechanisms influencing the chicken embryo and chicken's response to NDV infection. It is interesting to note the differences seen in the pathway analysis from the Leghorn and Fayoumi sublines and a possible role for these genes in determining the level of susceptibility of a particular line to NDV infection. Most interestingly, the Ghs6 subline, previously found to be susceptible to NDV, differs from the other sublines pathway analysis by activating RNA virus replication rather than inhibiting it (QIAGEN, 2014; Deist et al., 2017b). Since only the Ghs6 and M15.2 sublines have been previously characterized for the level of susceptibility to NDV, expanding research in this area to include the other two sublines may provide a better insight into the use of the chicken embryo immune response as a tool to screen for the level of susceptibility to NDV of a particular line, and the role of the MHC complex. It is especially intriguing since the high responders, Ghs6 and M5.1, differ in the network analysis from the low responder, Ghs13. A more comprehensive study of the innate immune response of the different chicken lines, including a larger set of genes and larger sample sizes, and correlating these responses to the innate immune response and level of susceptibility of hatched chicks, using phenotypic characteristics such as viral load, mean death time, viral shedding, and antibody titers post NDV infection, will help provide insight into innate immune mechanisms of susceptibility to NDV. Other important future studies include assessing whether similar patterns in response using more pathogenic field strains which is especially important in regards to the response in backyard poultry.

Newcastle disease virus can be particularly devastating for farmers, especially smallholder farmers in Sub-Saharan Africa, due to high mortality rates in unvaccinated flocks. Uncovering innate immune mechanisms related to the level of susceptibility to NDV would have major impacts on productivity for these farmers by informing breeding strategies to produce more robust chickens. We note that the potential use of the chicken embryo as a model provides a framework for future studies of the development of the innate immune response of chickens to NDV (and other pathogens) allowing for screening of large numbers of birds to uncover genetic markers for both disease resistance/susceptibility and productivity traits, such as egg or meat production.

## AUTHOR CONTRIBUTIONS

MS, VK, JB, and FM designed the study. MS, SM, and MC performed the laboratory experiments. MS, RK, SM, and MC performed the data analysis. MS wrote the manuscript. RK, JR-B, MD, and SL critically revised the content. All authors read and approved the final manuscript.

## FUNDING

This research was supported by grant (OPP1083453) from the Bill and Melinda Gates Foundation (to VK and MS) and USAID Feed the Future Innovation Lab for Genomics to Improve Poultry and Hatch project #5357, USDA NIFA 2013-38420-20496 (to SL).

## ACKNOWLEDGMENTS

The authors thank the Kapur Lab, especially Dr. Lingling Li, for her guidance and support throughout the study. They thank Dr. Huaguang Lu and Dr. Suresh Kuchipudi and their lab groups for their guidance and technical assistance with the chicken embryo experiments. They also thank Dr. Beatus Lyimo for his support during experimentation at the Nelson Mandela African Institution of Science and Technology in Arusha, TZ.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00061/full#supplementary-material

## REFERENCES

fgene-09-00061 February 24, 2018 Time: 13:47 # 9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Schilling, Katani, Memari, Cavanaugh, Buza, Radzio-Basu, Mpenda, Deist, Lamont and Kapur. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Gene Expression Response to Sea Lice in Atlantic Salmon Skin: RNA Sequencing Comparison Between Resistant and Susceptible Animals

Diego Robledo<sup>1</sup> \*, Alejandro P. Gutiérrez<sup>1</sup> , Agustín Barría<sup>2</sup> , José M. Yáñez2,3† and Ross D. Houston<sup>1</sup> \* †

<sup>1</sup> The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom, <sup>2</sup> Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile, <sup>3</sup> Aquainnovo S.A., Puerto Montt, Chile

#### Edited by:

Mark S. Fife, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Filippo Biscarini, Consiglio Nazionale delle Ricerche (CNR), Italy Dirk-Jan De Koning, Swedish University of Agricultural Sciences, Sweden

#### \*Correspondence:

Diego Robledo Diego.Robledo@roslin.ed.ac.uk Ross D. Houston ross.houston@roslin.ed.ac.uk †These authors have contributed equally to this work.

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 14 May 2018 Accepted: 11 July 2018 Published: 03 August 2018

#### Citation:

Robledo D, Gutiérrez AP, Barría A, Yáñez JM and Houston RD (2018) Gene Expression Response to Sea Lice in Atlantic Salmon Skin: RNA Sequencing Comparison Between Resistant and Susceptible Animals. Front. Genet. 9:287. doi: 10.3389/fgene.2018.00287 Sea lice are parasitic copepods that cause large economic losses to salmon aquaculture worldwide. Frequent chemotherapeutic treatments are typically required to control this parasite, and alternative measures such as breeding for improved host resistance are desirable. Insight into the host–parasite interaction and mechanisms of host resistance can lead to improvements in selective breeding, and potentially novel treatment targets. In this study, RNA sequencing was used to study the skin transcriptome of Atlantic salmon (Salmo salar) parasitized with sea lice (Caligus rogercresseyi). The overall aims were to compare the transcriptomic profile of skin at louse attachment sites and "healthy" skin, and to assess differences in gene expression response between animals with varying levels of resistance to the parasite. Atlantic salmon pre-smolts were challenged with C. rogercresseyi, growth and lice count measurements were taken for each fish. 21 animals were selected and RNA-Seq was performed on skin from a louse attachment site, and skin distal to attachment sites for each animal. These animals were classified into family-balanced groups according to the traits of resistance (high vs. low lice count), and growth during infestation. Overall comparison of skin from louse attachment sites vs. healthy skin showed that 4,355 genes were differentially expressed, indicating local up-regulation of several immune pathways and activation of tissue repair mechanisms. Comparison between resistant and susceptible animals highlighted expression differences in several immune response and pattern recognition genes, and also myogenic and iron availability factors. Components of the pathways involved in differential response to sea lice may be targets for studies aimed at improved or novel treatment strategies, or to prioritize candidate functional polymorphisms to enhance genomic selection for host resistance in commercial salmon breeding programs.

Keywords: Caligus rogercresseyi, Salmo salar, aquaculture, disease, parasite, RNA-Seq, host–parasite, differential expression

#### INTRODUCTION

Aquaculture is currently the fastest growing food industry (Food and Agriculture Organization of the United Nations, 2016) and is essential to meet increasing global demands for fish. However, the sustainability and prolonged success of any farming industry depends on effective disease prevention and control, and this tends to be particularly challenging for aquaculture. The aquatic

**14**

environment and high stock density can expedite pathogen spread, which has historically resulted in periodic mass mortality events (Lafferty et al., 2015; Food and Agriculture Organization of the United Nations, 2017) and ongoing challenges in disease prevention and control. While biosecurity measures, vaccination, nutrition, and medicines all play vital roles for several diseases, selective breeding to produce more resistant and tolerant aquaculture stocks is rapidly becoming a key component of the battle to prevent these outbreaks (Yáñez et al., 2014a; Palaiokostas et al., 2016).

Sea lice, ectoparasites of the family Caligidae, are one of the major disease problems that the aquaculture industry is facing, and specifically for salmon farming. Atlantic salmon (Salmo salar) is the most important species in aquaculture with a production value of 14.7 billion US dollars in 2014 (Food and Agriculture Organization of the United Nations, 2016), therefore control of sea lice is a primary goal for the industry. Sea lice-related economic losses to worldwide salmonid aquaculture were estimated at ∼430 million USD per annum (Costello, 2009). Two lice species present the primary concerns for salmon farming: primarily Lepeophtheirus salmonis in the Northern Hemisphere and Caligus rogercresseyi in the Southern Hemisphere (Johnson et al., 2004). These copepods parasitize salmon during the marine phase of the lifecycle by attaching to their skin or fins, and feeding on the blood and tissue. This leads to open wounds which can facilitate the entry of other pathogens. The impaired growth and secondary infections cause significant negative animal welfare and economic impact (Frazer et al., 2012). Despite extensive use of both chemical (i.e., hydrogen peroxide, emamectin benzoate, organophosphates, pyrethroids, or benzoyl ureas) and non-chemical treatments (i.e., fresh water bath) to control sea lice, their negative impact on salmon aquaculture has increased in the past years (Torrisen et al., 2013), and various sea lice populations have been reported to be resistant to the most common chemicals available for therapeutic control, such as emamectin benzoate, azamethiphos (organophosphate), deltamethrin (pyrethroid), and even hydrogen peroxide (Aaen et al., 2015). Therefore, alternative methods to control sea lice are currently being studied, including the use of probiotics to reduce salmon attractiveness for sea lice (Jodaa et al., 2016) or cohabitation with lice-eating species (Imsland et al., 2014; Leclercq et al., 2014).

Knowledge of the interaction between salmon and sea lice can help devise more effective prevention and treatment strategies. Therefore, a lot of effort has been put in characterizing the host response to sea lice infestation (reviewed in Fast, 2014). Interestingly, the outcome of infestation varies for different salmonid species (Johnson and Albright, 1992), with coho salmon (Oncorhynchus kisutch) showing rapid inflammatory response and epithelial hyperplasia, leading to parasite encapsulation and more than 90% reduction in lice loads (Fast, 2014). In comparison, Atlantic salmon (S. salar) is highly susceptible to sea lice infestation and seemingly cannot mount a fully effective immune response (Fast, 2014). Comparative transcriptomics has shown that iron sequestration, increased expression of pattern recognition receptors such as c-type lectins and upregulation of pro-inflammatory cytokines such as interleukin-1β are observed in salmon species resistant to sea lice (Sutherland et al., 2014). Interleukin-1β has also been implicated in successful responses to sea lice in other salmonid species such as pink salmon (Oncorhynchus gorbuscha) and coho salmon (Braden et al., 2012, 2015; Sutherland et al., 2015), and recently immunostimulant feeds up-regulating interleukin-1β in skin and spleen have shown some promising results to boost Atlantic salmon resistance to sea lice (Sutherland et al., 2017). While these studies have mainly focused on L. salmonis, similar findings have been observed in C. rogercresseyi infestation. For instance, comparative analyses of Atlantic and coho salmon parasitized with C. rogercresseyi showed that despite both showing upregulation of pro-inflammatory genes, the response was highly specific, characterized in coho by an activation of the TH1 response (Valenzuela-Muñoz et al., 2016). Another study linked iron sequestration and depletion mechanisms to the Atlantic salmon immune response to C. rogercresseyi (Valenzuela-Muñoz et al., 2017).

A promising and potentially complementary approach to existing control measures is to exploit natural genetic variation in farmed salmon populations to breed stocks with enhanced resistance to the parasite. The presence of significant genetic variation for resistance to C. rogercresseyi, with heritability values ranging between 0.1 and 0.34, demonstrates the feasibility of improving this trait by selective breeding in Atlantic salmon (Lhorente et al., 2014; Yáñez et al., 2014b). Current evidence indicates that host resistance to sea lice in Atlantic salmon has a highly polygenic genetic basis, with little evidence for major QTL (Ødegård et al., 2014; Gharbi et al., 2015; Correa et al., 2016, 2017; Tsai et al., 2016). Therefore, genomic selection using genome-wide markers to predict lice resistant breeding values has been widely applied in commercial Atlantic salmon breeding programs, with a relative advantage compared to pedigree selection of 10–27% (Tsai et al., 2016; Correa et al., 2017). Understanding the underlying functional basis of genetic resistance to sea lice can lead to improved methods of selective breeding. For example, incorporating functional variants into genomic prediction models could help improve prediction accuracy, in particular for cross-population prediction (MacLeod et al., 2016). Functional annotation of reference genomes is pertinent to this process, and the emerging Functional Annotation of All Salmonid Genomes (FAASG) project (Macqueen et al., 2017) is aiming to improve genome annotation for Atlantic salmon (among other species). Further, the discovery of putatively causative genes and variants could, in the near future, lead to their introduction into populations or species where it has never been present through the use of genome editing, for example using CRISPR-Cas9 technology, which has been successfully applied in salmon to knockout two genes related to pigmentation (tyrosinase and solute carrier family 45 member 2) and the dnd (dead end) gene, producing albino (Edvardsen et al., 2014), and germ cell-free salmon (Wargelius et al., 2016), respectively.

Expression differences between Atlantic salmon resistant and susceptible families in response to L. salmonis for 32 immune genes suggested that resistant fish are better at avoiding immunosuppression (Holm et al., 2015). The same study found

suggestive evidence that physical tissue barrier such as enhanced mucus production does not contribute to resistance (Holm et al., 2015). However, to our knowledge, the functional basis of genomic resistance to sea lice in Atlantic salmon has not been studied on a genome-wide scale, nor it has been explored in response to C. rogercresseyi. RNA sequencing can provide a first layer toward a holistic view of the host response to parasite infection, which in turn can highlight specific genes, pathways, and networks involved in the host–parasite interaction. RNA sequencing can also be used to identify single nucleotide polymorphisms in transcribed regions, and to assess the putative impact of those genetic markers on transcript and protein function. The effect of these markers on gene expression (and ultimately host resistance) can be assessed by allelic-specific expression or expression QTL studies, leading to a shortlist of candidate functional variants.

The overall aims of the current study were to compare the transcriptome profile of salmon skin at louse attachment sites and "healthy" skin (from the same fish), and to evaluate differences in these profiles between animals with varying levels of resistance to the parasite. To achieve this, challenged animals were classified into family-balanced groups according to resistance (based on high vs. low lice count) and growth during infestation, and RNA sequencing was performed on individual samples. By comparing resistant vs. susceptible samples, genes and pathways related to local immune response and host resistance were identified, and their potential role discussed.

#### MATERIALS AND METHODS

#### Experimental Design

2,668 Atlantic salmon (S. salar) pre-smolts (average weight: 136 g) from 104 families from the breeding population of AquaInnovo (Salmones Chaicas, Xth Region, Chile), were experimentally challenged with C. rogercresseyi (chalimus II–III). This population will be used for a future study on sea lice resistance genetic architecture and genomic selection. Briefly, infestation with the parasite was carried out by using 13–24 copepodids per fish and stopping the water flow for 6 h after infestation. Eight days after the infestation fish were euthanized and fins from each fish were collected and fixed for processing and lice counting. 42 samples from 21 fish from 6 different families (2–5 fish per family) were selected for RNA sequencing (Supplementary File S1) based on the traits of interest (number of sea lice attached to their fins and growth during challenge). Skin samples (both from attachment sites and health skin) were obtained from each animal and stored in RNAlater at 4◦C for 24 h, and then at −20◦C until RNA extraction for sequencing.

#### RNA Extraction and Sequencing

For all the 42 samples a standard TRI Reagent RNA extraction protocol was followed. Briefly, approximately 50 mg of skin was homogenized in 1 ml of TRI Reagent (Sigma, St. Louis, MO, United States) by shaking using 1.4 mm silica beads, then 100 µl of 1-bromo-3-chloropropane (BCP) was added for phase separation. This was followed by precipitation with 500 µl of isopropanol and posterior washes with 65–75% ethanol. The RNA was then resuspended in RNAse-free water and treated with Turbo DNAse (Ambion). Samples were then cleaned up using Qiagen RNeasy Mini kit columns and their integrity was checked on Agilent 2200 Bioanalyzer (Agilent Technologies, United States). Thereafter, the Illumina Truseq mRNA stranded RNA-Seq Library Prep Kit protocol was followed directly. Libraries were checked for quality and quantified using the Bioanalyzer 2100 (Agilent), before being sequenced on three lanes of the Illumina Hiseq 4000 instrument using 75 base paired-end sequencing at Edinburgh Genomics, United Kingdom. Raw reads have been deposited in NCBI's Sequence Read Archive (SRA) under Accession No. SRP100978.

#### Read Mapping

The quality of the sequencing output was assessed using FastQC v.0.11.5.<sup>1</sup> Quality filtering and removal of residual adaptor sequences was conducted on read pairs using Trimmomatic v.0.32 (Bolger et al., 2004). Specifically, Illumina specific adaptors were clipped from the reads, leading, and trailing bases with a Phred score less than 20 were removed and the read trimmed if the sliding window average Phred score over four bases was less than 20. Only reads where both pairs were longer than 36 bp post-filtering were retained. Filtered reads were mapped to the most recent Atlantic salmon genome assembly (ICSASG\_v2; GenBank Accession No. GCF\_000233375.1; Lien et al., 2016) using STAR v.2.5.2b (Dobin et al., 2013), the maximum number of mismatches for each read pair was set to 10% of trimmed read length, and minimum and maximum intron lengths were set to 20 bases and 1 Mb, respectively. Uniquely mapped paired-reads were counted and assigned to genes (NCBI S. salar Annotation Release 100) using FeatureCounts (Liao et al., 2014), included in the SourceForge Subread package v.1.5.0. Only reads with both ends mapped to the same gene were considered in downstream analyses.

#### Differential Expression

Differential expression analyses and gene functional and pathway enrichment analyses were performed using R v.3.3.1 (R Core Team, 2014). Gene count data were used to estimate differential gene expression using the Bioconductor package DESeq2 v.3.4 (Love et al., 2014). Briefly, size factors were calculated for each sample using the median of ratios method and count data was normalized to account for differences in library depth, next genewise dispersion estimates were fitted to the mean intensity using a parametric model and shrinked toward the expected dispersion values, finally a gegative binomial model was fitted for each gene and the significance of the coefficients was assessed using the Wald test. The Benjamini–Hochberg false discovery rate (FDR) multiple test correction was applied, and transcripts with FDR < 0.05 and absolute log<sup>2</sup> fold change values (FC) > 0.5 were considered differentially expressed genes. Hierarchical clustering and principal component analyses were performed to visually identify outlier samples that did not cluster close to other samples in the same category (lice attachment site or healthy skin), which

<sup>1</sup>http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

were then removed from the analyses as sampling errors could not be discounted. PCA plots were created using the R package factoextra.<sup>2</sup>

#### Pathway Enrichment

fgene-09-00287 August 3, 2018 Time: 12:30 # 4

Gene Ontology (GO) enrichment analyses were performed using Blast2GO v.4.1 (Conesa et al., 2005). Briefly, genes showing >10 reads in >90% of the samples were annotated against the manually curated protein database Swiss-Prot (Bairoch et al., 2004) and GO terms were assigned to them using Blast2GO. GO enrichment for specific genes lists was tested against the whole set of expressed genes using Fisher's exact test. GO terms with ≥5 DE genes assigned and showing a Benjamini–Hochberg FDR corrected p-value < 0.05 were considered enriched. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed using KOBAS v3.0.3 (Wu et al., 2006). Briefly, genes showing >10 reads in >90% of the samples were annotated against KEGG protein database (Kanehisa and Goto, 2000) to determine KEGG Orthology. KEGG enrichment for specific gene lists was tested by comparison to the whole set of expressed genes using Fisher's exact test. KEGG pathways with ≥5 differentially expressed (DE) genes assigned and showing a Benjamini–Hochberg FDR corrected p-value < 0.05 were considered enriched.

#### RESULTS AND DISCUSSION

#### Disease Challenge

A total of 2,632 fish belonging to 105 families from a commercial breeding program were challenged with C. rogercresseyi copepods, and euthanized for sampling 8 days post-challenge. Average lice burden per fish was 38 ± 16, and the estimated heritability of sea lice load was 0.28 ± 0.04 (unpublished results), therefore the differences in sea lice counts between fish has a genetic component. Fish were selected for RNA sequencing based on the traits of resistance, measured as number and concentration of lice per fish, and weight and length gain since the start of the challenge, which may reflect the ability of the fish to cope with the infestation. The selected fish allowed for 8 vs. 8 comparisons between family-matched fish showing differential resistance (26.2 ± 5.5 vs. 54.9 ± 13.5 sea lice per fish) and differential growth during infestation (7.0 ± 4.3 vs. 28.8 ± 12.3 weight gain percentage). A total of 42 samples (21 fish, skin from sites of louse attachment and healthy skin) were sequenced, resulting in an average of ∼27.9 ± 2.7 million reads per sample. After trimming, these were aligned against the salmon reference genome (ICSASG\_v2; GenBank Accession No. GCF\_000233375.1; Lien et al., 2016) and levels of gene expression were estimated according to the official salmon genome annotation (NCBI S. salar Annotation Release 100). An average of 19 M trimmed reads per sample were assigned to genes and used for downstream analyses of gene expression. All raw sequence data is available in NCBI's SRA under BioProject Accession No. SRP100978, and may be a useful contribution to the functional annotation of all salmonid genomes initiative (FAASG; Macqueen et al., 2017).

#### Louse Attachment Sites Versus Healthy Skin

Principal component analysis of gene expression (**Figure 1**) revealed a relatively clear cluster of healthy skin samples, while lice-attachment samples were more scattered, probably reflecting variation in the individual response to sea lice. Differential expression between healthy and louse attachment sites resulted in 4,355 DE genes (Supplementary File S2), with a higher number of up-regulated (more expressed in attachment sites) than down-regulated genes (n = 3,114 vs. n = 1,241). Among these DE genes were well-known components of the innate immune response such as interleukins, interferon response factors and complement components (**Figure 2A**). GO term and KEGG pathway analyses (Supplementary File S2) revealed a clear enrichment of immune pathways and functions among the up-regulated genes (**Figure 2B**), highlighting a localized immune response strongly related to cytokine activity. A similar scenario has been observed in other salmonids such as coho salmon where resistance to sea lice has been associated with early inflammation in skin and head kidney, which results in epithelial hyperplasia and often parasite encapsulation and removal of the sea lice within 2 weeks (Johnson and Albright, 1992; Fast et al., 2002). In pink salmon, an early and high expression of pro-inflammatory genes (IL-8, TNFα-1, and IL-1β) has been suggested as a mechanism of rapid louse rejection (Fast et al., 2007). The classical complement pathway has also been linked to resistance of host fish to parasitic copepod infection (Fast, 2014). The results presented here indicate that despite a marked upregulation of the local inflammatory response and complement pathway in Atlantic salmon, in part resembling the response of coho salmon or pink salmon, this does not seem to be sufficient to successfully respond to the louse attachment and feeding.

In addition to the expected innate immune response observed above, cell division related processes were also clearly upregulated at louse attachment sites, and well-characterized genes involved in tissue repair such as fibroblast growth factor-binding protein 1 and Epigen showed significant differences between lice attachment sites and healthy skin (FC > 3). Several genes related to the cell matrix and cell adhesion also had higher expression at attachment sites (i.e., cadherin-13, integrin alpha-2, desmoplakin, or various keratin and collagen genes). Cell proliferation is the main response to skin wounds in fish (Iger and Abraham, 1990), and these results are consistent with those previously found in the early response to L. salmonis (Skugor et al., 2008). Several mucins were also found to have higher expression at attachment sites, pointing toward increased mucus production and secretion, which can also be a typical response to wounding in fish (Fast, 2014).

#### Resistance

<sup>2</sup>http://www.sthda.com/english/rpkgs/factoextra/

Resistance, measured as number of sea lice per fish, was evaluated using two different approaches: correlation between

gene expression and sea lice loads, and differential expression between family-matched fish showing high and low sea lice loads.

#### Correlation Between Gene Expression and Sea Lice Loads

We studied the correlation of gene expression and sea lice counts in healthy skin and sea lice attachment sites. Genes showing r > |0.75| with sea lice counts were considered of interest (Supplementary File S3).

The expression levels of five immune receptors in healthy skin were positively correlated with sea lice loads (**Table 1**). Macrophage mannose receptor 1 (MRC1) shows the highest positive correlation with number of sea lice (r = 0.87), and also the highest expression difference between louse attachment vs. healthy skin (FC = 4.79). MRC1 is a c-type lectin receptor, expressed in macrophages, dendritic cells, and skin in humans. MRC1 plays a role both in innate and adaptive immunity and also acts as a recognition receptor for different pathogens such as bacteria, virus, or fungi (East and Isacke, 2002). C-type lectin receptor A (r = 0.81; FC = −1.34) is another lectin receptor involved in antigen recognition and immune response (Geiktenbeek and Gringhuis, 2009). Lectins such as MRC1 and CLEC4E have been found to be induced by glucosinolateenriched feeds in Atlantic salmon, which also reduced lice counts between 17 and 25% (Holm et al., 2016), and are also upregulated in response to sea lice in the more resistant pink salmon species (Sutherland et al., 2014). Lectins have been reported to activate the immune system in response to parasites in several different species (Vázquez-Mendoza et al., 2013; Hoving et al., 2014), therefore modulation of these genes represents a possible route to enhance Atlantic immune responses to sea lice. Two immune receptors were negatively correlated with number of sea lice, CD97 (r = −0.84) and suppressor of cytokine signaling 5 (SOCS5; r = −0.76). CD97 regulates cytokine production and T-cell activation and proliferation (Capasso et al., 2006; Abbott et al., 2007); while SOCS5 is part of the cytokinemediated signaling pathway, and acts as a negative regulator of inflammatory response and other immune-related pathways (Seki et al., 2002). Since, it was not possible to take skin samples prior to infection, it is difficult to distinguish between cause and effect; i.e., it is plausible that the negative correlation of these genes with number of sea lice is simply indicating that the immune system of the host responds proportionally to the degree of lice infestation. Nonetheless, the data support a major role for

arbitrarily positioned along the x-axis. (B) Selection of GO terms enriched amongst DE genes between healthy and injured skin.



these genes in the host response to sea lice, and the differences in lice count between fish has a genetic component, to which these genes may contribute.

Amongst genes without a (well-known) immune function, there was an association between SUMO1 (r = 0.76) and SUMO3 (r = −0.91) expression and sea lice loads. Small ubiquitinlike modifier (SUMO) proteins are small proteins similar to ubiquitins that are covalently attached to other proteins to modify their function. According to the gene expression data, SUMO1 seems to be preferred over SUMO3 in salmon upon sea lice infestation. Although post-translational modifications have been barely explored in fish, in mice SUMOylation has been shown to be involved in modulation of host innate immune response to pathogens (Decque et al., 2016). SUMOylation is also a very active field of research in plants, where SUMO is known to be involved in many important processes such as plant response to

environmental stresses, including pathogens (Park et al., 2011). It would be interesting to further study the role of SUMO in modulating Atlantic salmon responses to sea lice.

The results were markedly different in lice-attachment sites (Supplementary File S3), and congruent with differential expression between lice-attachment and healthy skin, with inflammatory genes such as toll-like receptor 12 or caspase 3 showing high correlations with sea lice loads. Similarly, one of the sox9 paralogs (sox9a) was also highly correlated with lice loads. Sox9 has a pro-proliferation function in human epidermal keratinocytes (Shi et al., 2013), and therefore this transcription factor is probably promoting wound healing in sea lice attachment sites. Finally, the gene hepcidin-1 is also correlated with sea lice counts in lice-attachment sites. Hepcidin is a regulator of iron metabolism, which as mentioned in the introduction has been associated with response to C. rogercresseyi (Valenzuela-Muñoz et al., 2017).

#### Differential Expression Between High and Low Sea Lice Loads

The samples for RNA sequencing were chosen to enable 8 vs. 8 comparison between family-matched fish (three families with two fish per group, two families with one fish) with high and low values for resistance (26.2 ± 5.5 vs. 54.9 ± 13.5 sea lice per fish). There were 43 genes significantly differentially expressed between resistant and susceptible fish (Supplementary File S4). All but one were from comparison of healthy skin samples between the two groups, which seems to suggest that the differences in resistance are systemic rather than local to louse attachment sites. The susceptible group had higher expression levels for genes involved in muscle contraction like troponins and myosins, which was also highlighted by GO enrichment analyses (**Figure 3**). Myosins and troponins have previously been identified as genes that respond to sea lice attachment in salmon skin (Holm et al., 2015). Further, Caligus infection is known to induce increased enzyme activity in muscle tissue (Vargas-Chacoff et al., 2017), and behavioral changes in the fish such as flashing and jumping are associated with ectoparasite removal (Furevik et al., 1993; Magnhagen et al., 2008). It has been recently reported that inactivity or reduced swimming activity contribute to resistance to sea lice (Bui, 2017), so it is possible that the high lice counts of susceptible fish in this study are due to higher activity levels with associated expression of muscle contraction related genes. In turn, high lice burden can provoke behavioral responses increasing fish activity, which results in the up-regulation of muscle genes, increasing the expression differences between resistant-passive-low lice fish and susceptible-active-high lice fish.

Two heme oxygenase genes, encoding enzymes, which catalyze the degradation of heme, also had higher expression levels in susceptible samples (**Figure 3**), which is consistent with the positive correlation with lice loads of the ironsequestration gene hepcidin. These genes have been previously shown to be up-regulated in response to Caligus infection (Valenzuela-Muñoz and Gallardo-Escárate, 2017). Importantly, iron availability was found to be reduced in the highly resistant species pink salmon infected with L. Salmonis (Sutherland et al., 2014), and hematocrit and anemia were also found to be reduced in chum salmon (Oncorhynchus keta) in response to

**20**

sea lice (Jones et al., 2007). It is therefore plausible that the more effective reduction of iron availability in Atlantic salmon (perhaps behaving more similarly to the resistant pink salmon) might be related to increased resistance to sea lice.

Finally, three immune receptors showed higher expression in susceptible samples (**Figure 3**); C-X-C chemokine receptor type 2 is a receptor for IL-8, its binding causes activation of neutrophils; while C type lectin receptors A (also found to be positively correlated with sea lice counts in healthy skin) and B are leptin receptors with an important role in pathogen recognition and immunity (Geiktenbeek and Gringhuis, 2009), as previously discussed. While it is clear that resistance and host response to sea lice is multifactorial in nature, these genes related to muscle contraction, iron availability and immunity may be targets for functional validation in future studies, and for crossreferencing with genome-wide association analyses to identify candidate causative genes and variants.

#### Growth During Infestation

Differences in weight gain percentage from the start to the end of the trial were also investigated. Weight gain during infestation did not show any significant correlation with initial weight (r = −0.27, p = 0.10), sea lice counts (r = 0.12, p = 0.45) or sea lice density (r = 0.19, p = 0.24) in our dataset, and the means for these three traits are not significantly different between our groups showing differential growth during infestation (t-test p-values > 0.35). Family-matched fish (8 vs. 8; three families with two fish per group, two families with one fish) with differential weight gain during infestation (7.0 ± 4.3 vs. 28.8 ± 12.3 weight gain percentage) were compared. A total of 24 and 1 genes were found differentially expressed between fish showing high and low weight gains in healthy and sea louse attachment site samples, respectively (Supplementary File S5). The gene differentially expressed in injured skin, solute carrier family 15 member 1 (SLC15A1), also showed the lowest p-value and highest FC in healthy skin (FC = 3.38, p = 0.003). The SLC15A1 protein is a membrane transporter that mediates the uptake of dipeptides and tripeptides, in humans this gene is expressed in the intestinal epithelium and plays a major role in protein absorption (Adibi, 1997). Another interesting DE gene is myogenic regulatory factor 6 (MYF6; FC = 0.72, p = 0.04). Myogenic regulatory factors are transcription factors that regulate muscle development (Perry and Rudnick, 2000); in Senegalese sole decreased expression of these factors was observed in fast muscle when fed with a high-lipid content diet, which caused reduced growth (Campos et al., 2010). While skin is unlikely to be a highly suitable tissue to study genes underlying fish growth during sea lice infestation, both myogenic factors and increased nutrient absorption, and specifically MYF6 and SLC15A1, are good candidates to better understand growth impairment differences under sea lice infestation.

#### CONCLUSION

The results of this study highlight that the early gene expression response of Atlantic salmon to sea lice involves up-regulation of many different components of the immune system (inflammatory response, cytokine production, TNF and NF-kappa B signaling and complement activation) along with tissue repair activation. The comparison of resistant vs. susceptible animals highlighted enrichment of pathways related to fish activity, iron availability and receptors modulating pathogen recognition and immune response. Overall, this study contributes to an improved understanding of Atlantic salmon early response to sea lice in skin, and into the gene expression profiles underpinning genetic resistance to sea lice in salmon. The identified pathways and genes may be targets for future studies aimed at development of new treatments, vaccines, or prevention strategies. The data can also be cross-referenced with high power genome-wide association studies to help prioritize putative causative genes and variants that have potential to improve genomic selection programs for genetic improvement of resistance to this industry's most serious disease.

## DATA AVAILABILITY

The raw reads generated for this study have been deposited in NCBI's Sequence Read Archive (SRA) under Accession No. SRP100978.

## ETHICS STATEMENT

The lice challenge experiments were performed under local and national regulatory systems and were approved by the Animal Bioethics Committee (ABC) of the Faculty of Veterinary and Animal Sciences of the University of Chile (Santiago, Chile), Certificate No. 01-2016, which based its decision on the Council for International Organizations of Medical Sciences (CIOMS) standards, in accordance with the Chilean standard NCh-324- 2011.

## AUTHOR CONTRIBUTIONS

RH, JY, and DR were responsible for the concept and design of this work and drafted the manuscript. AB managed the collection of the samples. AG performed the molecular biology experiments. DR performed bioinformatic and statistical analyses. All authors read and approved the final manuscript.

## FUNDING

This work was supported by an RCUK-CONICYT (Grant No. BB/N024044/1), Institute Strategic Funding Grants to The Roslin Institute (Grant Nos. BBS/E/D/20002172, BBS/E/D/30002275, and BBS/E/D/10002070), Edinburgh Genomics was partly supported through core grants from NERC (Grant No. R8/H10/56), MRC (Grant No. MR/K001744/1), and BBSRC (Grant No. BB/J004243/1). DR was supported by a Newton International Fellowship of the Royal Society (Grant No. NF160037).

## ACKNOWLEDGMENTS

fgene-09-00287 August 3, 2018 Time: 12:30 # 9

The authors would like to thank the contribution of Aquainnovo and Salmones Chaicas for providing the biological material and phenotypic records of the experimental challenges.

## REFERENCES


### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00287/full#supplementary-material

the CRISPR/Cas9 system induces complete knockout individuals in the F0 generation. PLoS One 9:e108622. doi: 10.1371/journal.pone.0108622



salmonis) by Atlantic salmon (Salmo salar). FACETS 2, 477–495. doi: 10.1139/facets-2017-0020


**Conflict of Interest Statement:** JY was supported by Aquainnovo S.A.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Robledo, Gutiérrez, Barría, Yáñez and Houston. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# RNA Sequencing (RNA-Seq) Reveals Extremely Low Levels of Reticulocyte-Derived Globin Gene Transcripts in Peripheral Blood From Horses (Equus caballus) and Cattle (Bos taurus)

#### Edited by:

John Anthony Hammond, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Chuanju Dong, Henan Normal University, China Luiz Lehmann Coutinho, Universidade de São Paulo, Brazil

#### \*Correspondence:

David E. MacHugh david.machugh@ucd.ie

#### †Present Address:

Nicolas C. Nalpas, Quantitative Proteomics and Proteome Centre Tübingen, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany Kevin Rue-Albrecht, Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 03 April 2018 Accepted: 09 July 2018 Published: 14 August 2018

#### Citation:

Correia CN, McLoughlin KE, Nalpas NC, Magee DA, Browne JA, Rue-Albrecht K, Gordon SV and MacHugh DE (2018) RNA Sequencing (RNA-Seq) Reveals Extremely Low Levels of Reticulocyte-Derived Globin Gene Transcripts in Peripheral Blood From Horses (Equus caballus) and Cattle (Bos taurus). Front. Genet. 9:278. doi: 10.3389/fgene.2018.00278

Frontiers in Genetics | www.frontiersin.org 1 August 2018 | Volume 9 | Article 278

Carolina N. Correia<sup>1</sup> , Kirsten E. McLoughlin<sup>1</sup> , Nicolas C. Nalpas 1†, David A. Magee<sup>1</sup> , John A. Browne<sup>1</sup> , Kevin Rue-Albrecht 1†, Stephen V. Gordon2,3 and David E. MacHugh1,3 \*

<sup>1</sup> Animal Genomics Laboratory, UCD School of Agriculture and Food Science, UCD College of Health and Agricultural Sciences, University College Dublin, Dublin, Ireland, <sup>2</sup> UCD School of Veterinary Medicine, UCD College of Health and Agricultural Sciences, University College Dublin, Dublin, Ireland, <sup>3</sup> UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland

RNA-seq has emerged as an important technology for measuring gene expression in peripheral blood samples collected from humans and other vertebrate species. In particular, transcriptomics analyses of whole blood can be used to study immunobiology and develop novel biomarkers of infectious disease. However, an obstacle to these methods in many mammalian species is the presence of reticulocyte-derived globin mRNAs in large quantities, which can complicate RNA-seq library sequencing and impede detection of other mRNA transcripts. A range of supplementary procedures for targeted depletion of globin transcripts have, therefore, been developed to alleviate this problem. Here, we use comparative analyses of RNA-seq data sets generated from human, porcine, equine, and bovine peripheral blood to systematically assess the impact of globin mRNA on routine transcriptome profiling of whole blood in cattle and horses. The results of these analyses demonstrate that total RNA isolated from equine and bovine peripheral blood contains very low levels of globin mRNA transcripts, thereby negating the need for globin depletion and greatly simplifying blood-based transcriptomic studies in these two domestic species.

Keywords: blood, cattle, globin, horses, pigs, reticulocyte, RNA-seq, transcriptome

## INTRODUCTION

It is increasingly recognised that new technological approaches are urgently required for infectious disease diagnosis, surveillance, and management in burgeoning domestic animal populations as livestock production intensifies across the globe (Thornton, 2010; Nabarro and Wannous, 2014; Animal Task Force, 2016). In this regard, new strategies have emerged that leverage peripheral blood gene expression to study host immunobiology and to identify panels of RNA transcript biomarkers that can be used as specific biosignatures of infection by particular pathogens for both animal and human infectious disease (Ramilo and Mejias, 2009; Mejias and Ramilo, 2014; Chaussabel, 2015; Ko et al., 2015; Holcomb et al., 2017). For example, we and others have applied this approach to bovine tuberculosis (BTB) caused by infection with Mycobacterium bovis (Meade et al., 2007; Killick et al., 2011; Blanco et al., 2012; Churbanov and Milligan, 2012; McLoughlin et al., 2014; Cheng et al., 2015). It is also important to note that peripheral blood transcriptomics using technologies such as microarrays or RNA-sequencing (RNA-seq) can be used to monitor changes in the physiological status of domestic animals due to reproductive status, diet and nutrition or stress (O'Loughlin et al., 2012; Takahashi et al., 2012; Song et al., 2013; Kolli et al., 2014; Shen et al., 2014; de Greeff et al., 2016; Elgendy et al., 2016; Jégou et al., 2016).

During the last 15 years, a major hindrance to whole blood transcriptomics studies has emerged, which is the presence of large quantities of globin mRNA transcripts in peripheral blood from many mammalian species (Wu et al., 2003; Fan and Hegde, 2005; Liu et al., 2006). This is a consequence of abundant α globin and β globin mRNA transcripts in circulating reticulocytes, which in humans, may account for more than 95% of the total cellular mRNA content in these immature erythrocytes (Debey et al., 2004). Reticulocytes, in turn, account for 1–4% of the erythrocytes in healthy adult humans, which corresponds to between 5 × 10<sup>7</sup> and 2 × 10<sup>8</sup> cells per ml compared to 7 × 10<sup>6</sup> cells per ml for leukocytes (Greer et al., 2013). Hence, globin transcripts can account for a substantial proportion of total detectable mRNAs in peripheral blood samples collected from humans and many other mammals (Bruder et al., 2010; Winn et al., 2010; Schwochow et al., 2012; Choi et al., 2014; Shin et al., 2014; Bowyer et al., 2015; Huang et al., 2016; Morey et al., 2016). In particular, for humans, more than 70% of peripheral blood mRNA transcripts are derived from the haemoglobin subunit alpha 1, subunit alpha 2 and subunit beta genes (HBA1, HBA2, and HBB) (Wu et al., 2003; Field et al., 2007; Mastrokolias et al., 2012).

The emergence of massively parallel transcriptome profiling for clinical applications in human peripheral blood—initially with gene expression microarrays, but more recently using RNAseq—has prompted development of methods for the systematic reduction of globin mRNAs in total RNA samples purified from peripheral blood samples, including: oligonucleotides that bind to globin mRNA molecules with subsequent digestion of the RNA strand of the RNA:DNA hybrid (Wu et al., 2003); peptide nucleic acid (PNA) oligonucleotides that are complementary to globin mRNAs and block reverse transcription of these targets (Liu et al., 2006); the GLOBINclearTM system, which uses biotinylated oligonucleotides that hybridise with globin transcripts followed by capture and separation using streptavidin-coated magnetic beads (Field et al., 2007); and the recently introduced GlobinLock method that uses a pair of modified oligonucleotides complementary to the 3′ portion of globin transcripts and that block enzymatic extension (Krjutškov et al., 2016).

In the present study we use RNA-seq data generated from globin-depleted and non-depleted total RNA purified from human and porcine peripheral blood, in conjunction with non-depleted total RNA isolated from equine and bovine peripheral blood, for a comparative investigation of the impact of reticulocyte-derived globin mRNA transcripts on routine transcriptome profiling of blood in domestic cattle and horses. The primary objective of the present study to test the hypothesis that both cattle and horses exhibit significantly lower quantities of haemoglobin gene transcripts compared to humans and pigs.

## MATERIALS AND METHODS

#### Data Sources

RNA-seq data sets from human peripheral whole blood samples used for assessment of globin depletion and with parallel nondepleted controls (Shin et al., 2014) were obtained from the NCBI Gene Expression Omnibus (GEO) database (accession number GSE53655). A comparable RNA-seq data set from globin-depleted and non-depleted porcine peripheral whole blood was obtained directly from the study authors (Choi et al., 2014). A published RNA-seq data set (Ropka-Molik et al., 2017) from equine non-depleted peripheral whole blood was obtained from the NCBI GEO database (accession number GSE83404). Finally, bovine RNA-seq data from peripheral whole blood were generated by us as described below and can be obtained from the European Nucleotide Archive (ENA) database (PRJEB27764). A summary overview of the methodology used for the current study is shown in **Figure 1**.

#### Human, Porcine and Equine Sample Collection, Globin Depletion, and RNA-Seq Libraries

Detailed information concerning ethics approval, sample collection, total RNA extraction, and RNA-seq library preparation and sequencing for the human, porcine, and equine data sets is provided in the original publications (Choi et al., 2014; Shin et al., 2014; Ropka-Molik et al., 2017). **Supplementary Table 1** provides summary information on the human, porcine and equine samples and RNA-seq libraries.

In brief, for the human samples, peripheral blood from six healthy subjects (three females and three males) was collected into PAXgene blood RNA tubes (PreAnalytiX/Qiagen Ltd., Manchester, UK). Total RNA, including small RNAs, was purified from the collected blood samples using the PAXgene Blood miRNA Kit (PreAnalytiX/Qiagen Ltd.) as described by Shin et al. (2014). Human HBA1, HBA2 and HBB mRNA transcripts were depleted from a subset of the total RNA samples using the GLOBINclear kit (InvitrogenTM/Thermo Fisher Scientific, Loughborough, UK). RNA-seq data was then generated using 24 paired-end (PE) RNA-seq libraries (12 undepleted and 12 globin-depleted) generated from the six biological replicates and six identical technical replicates created from pooled total RNA across all six donor samples. The multiplexing and sequencing was then performed such that data for the 12 samples in each treatment group (undepleted and globin depleted) was generated from two separate lanes of a single flow cell twice, for a total of four sequencing lanes (Shin et al., 2014).

Porcine peripheral blood samples were collected from 12 healthy crossbred pigs [Duroc × (Landrace × Yorkshire)] using TempusTM blood RNA tubes (Applied BiosystemsTM/Thermo Fisher Scientific, Warrington, UK) and total RNA was purified using the MagMAXTM for Stabilized Blood Tubes RNA Isolation Kit (InvitrogenTM/Thermo Fisher Scientific) (Choi et al., 2014). Porcine HBA and HBB mRNA transcripts were subsequently depleted from a subset of the total RNA samples using a modified RNase H globin depletion method with custom porcine-specific antisense oligonucleotides for HBA and HBB. RNA-seq data was then generated from 24 PE RNA-seq libraries (12 undepleted and 12 globin-depleted).

Equine peripheral blood samples were collected using TempusTM blood RNA tubes from 12 healthy Arabian horses (five females and seven males) at three different time points during flat racing training (Ropka-Molik et al., 2017). In addition, peripheral blood samples were collected from six healthy untrained Arabian horses (two females and four males). Total RNA was purified using the MagMAXTM for Stabilized Blood Tubes RNA Isolation Kit and 37 of the 42 total RNA samples were used to generate single-end (SE) libraries for RNA-seq data generation. Globin depletion for the equine samples was not performed prior to RNA-seq library preparation (Katarzyna Ropka-Molik, pers. comm.).

## Bovine Peripheral Blood Collection and RNA Extraction

Approximately 3 ml of peripheral blood from 10 age-matched healthy male Holstein-Friesian calves were collected into TempusTM blood RNA tubes. The TempusTM Spin RNA Isolation Kit (Applied BiosystemsTM/Thermo Fisher Scientific) was used to perform total RNA extraction and purification, following the manufacturer's instructions. RNA quantity and quality checking were performed using a NanoDropTM 1,000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and an Agilent 2,100 Bioanalyzer using an RNA 6,000 Nano LabChip kit (Agilent Technologies Ltd., Cork, Ireland). The majority of samples displayed a 260/280 ratio >1.8 and an RNA integrity number (RIN) >8.0 (**Supplementary Table 2**). Globin mRNA depletion was not performed on the total RNA samples purified from bovine peripheral blood samples.

## Bovine RNA-Seq Library Generation and Sequencing

Individually barcoded strand-specific RNA-seq libraries were prepared with 1 µg of total RNA from each sample. Two rounds of poly(A)<sup>+</sup> RNA purification were performed for all RNA samples using the Dynabeads <sup>R</sup> mRNA DIRECTTM Micro Kit (Thermo Fisher Scientific) according to the manufacturer's instructions. The purified poly(A)<sup>+</sup> RNA was then used to generate strand-specific RNA-seq libraries using the ScriptSeqTM v2 RNA-Seq Library Preparation Kit, the ScriptSeqTM Index PCR Primers (Sets 1 to 4) and the FailSafeTM PCR enzyme system (all sourced from Epicentre <sup>R</sup> /Illumina <sup>R</sup> Inc., Madison, WI, USA), according to the manufacturer's instructions.

RNA-seq libraries were purified using the Agencourt <sup>R</sup> AMPure <sup>R</sup> XP system (Beckman Coulter Genomics, Danvers, MA, USA) according to the manufacturer's instructions for double size selection (0.75× followed by 1.0× ratio). RNAseq libraries were quantified using a Qubit <sup>R</sup> fluorometer and Qubit <sup>R</sup> dsDNA HS Assay Kit (InvitrogenTM/Thermo Fisher Scientific), while library quality checks were performed using an Agilent 2,100 Bioanalyzer and High Sensitivity DNA Kit (Agilent Technologies Ltd.). Individually barcoded RNA-seq libraries were pooled in equimolar quantities and the quantity and quality of the final pooled libraries (three pools in total) were assessed as described above. Cluster generation and highthroughput sequencing of three pooled RNA-seq libraries were performed using an Illumina <sup>R</sup> HiSeqTM 2,000 Sequencing System at the MSU Research Technology Support Facility (RTSF) Genomics Core (https://rtsf.natsci.msu.edu/genomics; Michigan State University, MI, USA). Each of the three pooled libraries were sequenced independently on five lanes split across multiple Illumina <sup>R</sup> flow cells. The pooled libraries were sequenced as PE 2 × 100 nucleotide reads using Illumina <sup>R</sup> version 5.0 sequencing kits.

Deconvolution (filtering and segregation of sequence reads based on the unique RNA-seq library barcode index sequences; **Supplementary Table 2**) was performed by the MSU RTSF Genomics Core using a pipeline that simultaneously demultiplexed and converted pooled sequence reads into discrete FASTQ files for each RNA-seq sample with no barcode index mismatches permitted. The RNA-seq FASTQ sequence read data for the bovine samples were obtained from the MSU RTSF Genomics Core FTP server.

#### RNA-Seq Data Quality Control and Filtering/Trimming of Reads

Bioinformatics procedures and analyses were performed as described below for the human, porcine, equine, and bovine samples, except were specifically indicated. All of the bioinformatics workflow scripts were developed using GNU bash (version 4.3.48) (Free Software Foundation, 2013), Python (version 3.5.2) (Python Software Foundation, 2017), and R (version 3.4.0) (R Core Team, 2017). The scripts and further information are available at a public GitHub repository (https://github.com/carolcorreia/Globin\_RNA-sequencing).

Computational analyses were performed on a 32-core Linux Compute Server (4× AMD OpteronTM 6220 processors at 3.0 GHz with 8 cores each), with 256 GB of RAM, 24 TB of hard disk drive storage, and with Ubuntu Linux OS (version 14.04.4 LTS). Deconvoluted FASTQ files (generated from SE equine RNA-seq libraries and PE RNA-seq libraries for the other species) were quality-checked with FastQC (version 0.11.5) (Andrews, 2016).

Using the ngsShoRT software package (version 2.2) (Chen et al., 2014), filtering/trimming consisted of: (1) removal of SE or PE reads with adapter sequences (with up to three mismatches); (2) removal of SE or PE reads of poor quality (i.e., at least one of the reads containing ≥25% bases with a Phred quality score below 20); (3) for porcine samples only, 10 bases were trimmed at the 3′ end of all reads; (4) removal of SE or PE reads that did not meet the required minimum length (70 nucleotides for human and equine, 80 nucleotides for porcine and 100 nucleotides for bovine). Filtered/trimmed FASTQ files were then re-evaluated using FastQC. Filtered FASTQ files were transferred to a 36 core/64-thread Compute Server (2× Intel <sup>R</sup> Xeon <sup>R</sup> CPU E5- 2697 v4 at 2.30 GHz with 18 cores each), with 512 GB of RAM, 96 TB SAS storage (12× 8 TB at 7200 rpm), 480 GB SSD storage, and with Ubuntu Linux OS (version 16.04.2 LTS).

#### Transcript Quantification

The Salmon software package (version 0.8.2) (Patro et al., 2017) was used in quasi-mapping-mode for transcript quantification. Sequence-specific and fragment-level GC bias correction was enabled and transcript abundance was quantified in transcripts per million (TPM) for each filtered library (multiple lanes from the same library were processed together) was estimated after mapping of SE or PE reads to their respective reference transcriptomes. As summarised in **Table 1**, the NCBI RefSeq database is currently the only one to contain haemoglobin gene annotations for all species analysed. Hence, NCBI RefSeq reference transcript models were used for the human, porcine, equine, and bovine data sets. Detailed information about these reference transcriptomes is provided in **Supplementary Table 3**.

#### Gene Annotations and Summarisation of TPM Estimates at the Gene Level

Using R (3.5.0) within the RStudio IDE (version 1.1.447) (R Studio Team, 2015) and Bioconductor (version 3.7 using BiocInstaller 1.30.0) (Gentleman et al., 2004), the GenomicFeatures (version 1.32.0) (Lawrence et al., 2013) and AnnotationDbi (version 1.42.1) (Pagès et al., 2017) packages were used to obtain corresponding gene and transcript identifiers from the NCBI RefSeq annotation releases pertinent to each species, as detailed in **Table 1**. Using these identifiers, the tximport (version 1.8.0) package (Soneson et al., 2015) was used to import into R and summarise at gene level the TPM estimates obtained from the Salmon tool. A threshold of greater than or equal to 1 TPM across at least half of the total number of samples (≥12 for human and porcine, ≥18 for equine, and ≥5 for bovine) was applied in order to remove lowly expressed genes.

## Data Exploration, Plotting, and Summary Statistics

Data wrangling and tidying from all species was performed using the following R packages: tidyverse (version 1.2.1) (Wickham, 2017b), dplyr (version 0.7.5) (Wickham et al., 2017), tidyr (version 0.8.1) (Wickham and Henry, 2017), reshape2 (version 1.4.3) (Wickham, 2017a), and magrittr (version 1.5) (Bache and Wickham, 2017). The ggplot2 (version 2.2.1) (Wickham and Chang, 2017), and ggjoy (version 0.4.1) (Wilke, 2017), packages were used for figure generation. Finally, the mean and standard deviation were calculated for the undepleted and globin-depleted groups in each species using the skimr (version 1.0.2) R package (McNamara et al., 2017).

## RESULTS AND DISCUSSION

#### Status of Human, Porcine, Equine, and Bovine Haemoglobin Gene Annotations

Annotation of the haemoglobin subunit alpha 1 and 2 genes (HBA1 and HBA2, respectively) is well-established for the human genome; however, annotations for these genes in the porcine, equine, and bovine genomes are inconsistent across databases. As shown in **Table 1**, the porcine HBA gene annotation is absent from Ensembl and the UCSC Table Browser. For the NCBI RefSeq database, this gene has been assigned to two loci (LOC110259958 and LOC100737768) that have similar descriptions (haemoglobin subunit alpha and haemoglobin subunit alpha-like). Therefore, these NCBI LOC symbols were used.

Equine HBA (HBA1) and HBA2 genes are absent from the current Ensembl annotation release. Similarly, bovine HBA1 and HBA (HBA2) have been annotated as GLNC1 in Ensembl, whereas HBA1 is absent from the UCSC Table Browser annotation (**Table 1**). In the NCBI RefSeq database, equine HBA (HBA1) is described as haemoglobin subunit alpha 1; and bovine HBA (HBA2) is described as haemoglobin subunit alpha 2, thus their descriptions are shown in parenthesis herein. In contrast to these observations, haemoglobin subunit beta (HBB) genes for TABLE 1 | Status of current human, porcine, equine, and bovine haemoglobin gene annotations in the Ensembl, NCBI RefSeq, and UCSC databases.


a (Zerbino et al., 2018).

b (O'Leary et al., 2016).

c (Kent et al., 2002).

d (Karolchik et al., 2004).

e (Tyner et al., 2017).

the four species are well-annotated in Ensembl, NCBI RefSeq and UCSC Genome Browser databases (**Table 1**).

At the time of writing, NCBI RefSeq is the only database that contains annotations for all three haemoglobin genes in all species analysed. Additionally, equine and bovine gene annotations are based on the latest genome assemblies (**Table 1**). EquCab3 and ARS-UCD1.2 have incorporated major improvements compared to previous versions, including increased genome coverage (from 6.8× and 9×, to 80×, respectively), and incorporation of PacBio sequencing reads (Kalbfleisch et al., 2018; Rosen et al., 2018).

#### Basic RNA-Seq Data Outputs

Unfiltered SE (equine libraries) or PE (human, porcine, and bovine libraries) RNA-seq FASTQ files were quality-checked, adapter- and quality-filtered prior to transcript quantification. As shown in **Table 2**, the human and porcine undepleted groups each had ∼40 million (M) raw reads per library, whereas globindepleted libraries showed a mean of ∼37 and 31 M, respectively. Equine and bovine libraries, which did not include a globin depletion step had an average of 24 M raw reads and 21 M raw read pairs, respectively.

After adapter- and quality-filtering of RNA-seq libraries, an average of 20 and 29% read pairs were removed from the human undepleted and globin-depleted libraries, respectively. Conversely, ∼12% of read pairs were removed from each of the porcine undepleted and globin-depleted libraries. For the undepleted equine and bovine RNA-seq libraries, an average of 0.2% reads and 17% read pairs were removed, respectively. Detailed information on filtering/trimming of RNAseq libraries from all species, including technical replicates from libraries sequenced over multiple lanes, is presented in **Supplementary Table 4**. All data sets exhibited a mean mapping rate >70% (**Table 2**). **Supplementary Tables 5** contain samplespecific RNA-seq mapping statistics.

#### Transcript Quantification

Transcript-level TPM estimates generated using the Salmon tool were imported into the R environment and summarised at gene level with the package tximport (Soneson et al., 2015). Gene-level TPM estimates represent the sum of corresponding transcriptlevel TPMs and provide results that are more accurate and comprehensible than transcript-level estimates (Soneson et al., 2015). In the current study, gene-level TPM estimates are referred as TPM.

Filtering of lowly expressed genes (see section Gene Annotations and Summarisation of TPM Estimates at the Gene Level) resulted in 12,951 genes expressed across all human samples, and represented 24% of 54,644 total annotated genes and pseudogenes. Porcine samples showed a total of 9,396 expressed genes (31% of 30,334 annotated genes and pseudogenes); and equine and bovine samples exhibited 12,724 (38% of 33,146) and 14,044 (40% of 35,143) expressed genes, respectively.

The density distribution of TPM values for the human and porcine samples improved after globin depletion; this is evident by the shift of gene detection levels toward greater log<sup>10</sup> TPM values for the globin-depleted samples in **Figure 2**. In this regard, it is noteworthy that the undepleted bovine and equine samples also exhibited similar TPM density distributions to the human and porcine globin-depleted samples.

#### Proportions of Human and Porcine Haemoglobin Gene Transcripts in Undepleted and Depleted Peripheral Blood

In line with previous reports (Field et al., 2007; Mastrokolias et al., 2012), the proportion of haemoglobin gene transcripts (HBA1, HBA2, and HBB) detected in undepleted human peripheral blood samples for the current study averaged 70% (**Figure 3** and **Supplementary Table 6**), which is lower than the mean proportion of 81% reported by Shin et al. (2014). On the other hand, after depletion the human samples exhibited an identical reduction to a 17% proportion of globin sequence reads in both


\*The Salmon tool categorises

 fragments as single read (for SE RNA-seq libraries) or a read pair (for PE RNA-seq libraries).

the present study and that of Shin et al. (2014) (**Figure 3** and **Supplementary Table 6**).

In the current study, for the undepleted porcine peripheral blood samples, the percentage of haemoglobin gene transcripts (LOC110259958 [HBA], LOC100737768 [HBA], and HBB) observed as a proportion of the total expressed genes was 72% (**Figure 3** and **Supplementary Table 6**), which is considerably larger than the mean of 46.1% reported in the original study



(Choi et al., 2014). Similarly, after depletion, the porcine samples in the present study contained a mean proportion of 22% globin transcripts (**Figure 3** and **Supplementary Table 6**) compared to a mean proportion of 8.9% reported by Choi et al. (2014). Additionally, **Table 3** shows the mean TPM for each haemoglobin gene across undepleted or globin-depleted samples.

A number of possible explanations, including the different approaches used for read mapping and transcript quantification, may account for the different proportions of haemoglobin gene transcript detected in human and porcine samples for the present study compared to the original studies (Choi et al., 2014; Shin et al., 2014). For the present study, a recently developed lightweight alignment method was adopted (Salmon and tximport), in contrast to the more traditional methodologies used in the original publications. Shin et al. (2014) used the TopHat and Cufflinks software tools (Trapnell et al., 2012), while Choi et al. (2014) implemented TopHat with Htseqcount (Anders et al., 2015). In addition to this, different gene annotations were used: NCBI Homo sapiens Annotation Release 109 and NCBI Sus scrofa Annotation Release 106 were used for the present study, while UCSC hg18 (H. sapiens) and Ensembl release 71 (S. scrofa) were used by Shin et al. (2014) and Choi et al. (2014), respectively.

## Equine and Bovine Peripheral Blood Contains Extremely Low Levels of Haemoglobin Gene Transcripts

The equine and bovine peripheral blood samples, which did not undergo globin depletion, had extremely low proportions of haemoglobin gene transcripts to total expressed genes: 0.21 and 0.17%, respectively (**Figure 3** and **Supplementary Table 6**). Notably, similar results have been reported in a transcriptomics study of bovine peripheral blood in response to vaccination against neonatal pancytopenia. In that study, 12 cows were profiled before and after vaccination (24 peripheral blood samples in total), and a mean proportion of 1.0% of RNA-seq reads were observed to map to the bovine α haemoglobin gene cluster on BTA25 or to the β haemoglobin gene cluster on BTA15 (Demasius et al., 2013). To the best of our knowledge, this is the first time that the average number of equine haemoglobin transcripts have been reported for RNA-seq data.

Finally, it is important to note that log<sup>2</sup> TPM values for haemoglobin gene transcripts in the undepleted equine and bovine peripheral blood RNA samples are substantially lower than log<sup>2</sup> TPM values for the globin-depleted human and porcine peripheral blood RNA samples (**Figure 4**). This is a direct consequence of extremely low levels of circulating reticulocytes in equine and bovine peripheral blood (Tablin and Weiss, 1985; Harper et al., 1994; Hossain et al., 2003; Cooper et al., 2005).

## CONCLUSION

In light of our RNA-seq data analyses, we propose that globin mRNA transcript depletion is not a pre-requisite for transcriptome profiling of bovine and equine peripheral blood samples. This observation greatly simplifies the laboratory and bioinformatics workflows required for RNA-seq studies of whole blood collected from domestic cattle and horses. It will also be directly relevant to future work on blood-based biomarker and biosignature development in the context of infectious disease, reproduction, nutrition, and animal welfare. For example, transcriptomics of peripheral blood has been used extensively in development of new diagnostic and prognostic modalities for human tuberculosis (HTB) disease caused by infection with Mycobacterium tuberculosis (for reviews see: Blankley et al., 2014; Haas et al., 2016; Weiner and Kaufmann, 2017; Goletti et al., 2018). Therefore, as a consequence of this HTB research, comparable transcriptomics studies in cattle (Meade et al., 2007; Killick et al., 2011; Blanco et al., 2012; Churbanov and Milligan, 2012; McLoughlin et al., 2014; Cheng et al., 2015), and the ease with which RNA-seq can be performed in bovine peripheral blood, it should be feasible to develop transcriptomics-based biomarkers and biosignatures for bovine tuberculosis caused by M. bovis infection.

## DATA ACCESSIBILITY

The RNA-seq data generated for this study using peripheral blood from 10 age-matched healthy male Holstein-Friesian calves can be obtained from the ENA database (PRJEB27764).

## ETHICS STATEMENT

Animal experimental work for the present study (cattle samples) was carried out according to the UK Animal (Scientific Procedures) Act 1986. The study protocol was approved by the Animal Health and Veterinary Laboratories Agency (AHVLA–Weybridge, UK), now the Animal & Plant Health Agency (APHA), Animal Use Ethics Committee (UK Home Office PCD number 70/6905).

## AUTHOR CONTRIBUTIONS

DEM, SG, CC, and KM conceived and designed the project and organised bovine sample collection. KM, NN, DAM, and JB performed RNA extraction and RNA-seq library generation. CC, KM, NN, KR-A, and DEM performed the analyses. CC and DEM wrote the manuscript. All authors reviewed and approved the final manuscript.

## FUNDING

This work was supported by Investigator Grants from Science Foundation Ireland (Nos: SFI/08/IN.1/B2038 and SFI/15/IA/3154), a Research Stimulus Grant from the Department of Agriculture, Food and the Marine (No: RSF 06 405), a European Union Framework 7 Project Grant (No: KBBE-211602- MACROSYS), a Brazilian Science Without Borders—CAPES grant (No: BEX-13070-13-4) and the UCD Wellcome Trust funded Computational Infection Biology Ph.D. Programme (Grant no: 097429/Z/11/Z).

## ACKNOWLEDGMENTS

The authors wish to express their gratitude to Prof Martin Vordermeier and Dr Bernardo Villarreal-Ramos (Animal and Plant Health Agency, UK) for provision of bovine peripheral blood samples, Prof Graham Plastow (University of Alberta, Canada) for provision of porcine peripheral blood RNA-seq data. We also thank Drs. Gabriella Farries (University College Dublin) and Kerri Malone (EMBL-EBI, Cambridge, UK) for stimulating discussion and advice concerning genome annotations, equine genetics and data visualisation.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00278/full#supplementary-material

## REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Correia, McLoughlin, Nalpas, Magee, Browne, Rue-Albrecht, Gordon and MacHugh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Analysis of a Commercial Egg Laying Line Challenged With Newcastle Disease Virus

Kaylee Rowland<sup>1</sup> , Anna Wolc1,2, Rodrigo A. Gallardo<sup>3</sup> , Terra Kelly3,4, Huaijun Zhou<sup>4</sup> , Jack C. M. Dekkers<sup>1</sup> and Susan J. Lamont<sup>1</sup> \*

<sup>1</sup> Department of Animal Science, Iowa State University, Ames, IA, United States, <sup>2</sup> Hy-Line International, Dallas Center, IA, United States, <sup>3</sup> School of Veterinary Medicine, University of California, Davis, Davis, CA, United States, <sup>4</sup> Department of Animal Science, University of California, Davis, Davis, CA, United States

#### Edited by:

Andrea B. Doeschl-Wilson, University of Edinburgh, United Kingdom

#### Reviewed by:

Yniv Palti, Cool and Cold Water Aquaculture Research (USDA-ARS), United States Filippo Biscarini, Consiglio Nazionale delle Ricerche (CNR), Italy

> \*Correspondence: Susan J. Lamont sjlamont@iastate.edu

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 19 April 2018 Accepted: 30 July 2018 Published: 20 August 2018

#### Citation:

Rowland K, Wolc A, Gallardo RA, Kelly T, Zhou H, Dekkers JCM and Lamont SJ (2018) Genetic Analysis of a Commercial Egg Laying Line Challenged With Newcastle Disease Virus. Front. Genet. 9:326. doi: 10.3389/fgene.2018.00326 In low income countries, chickens play a vital role in daily life. They provide a critical source of protein through egg production and meat. Newcastle disease, caused by avian paramyxovirus type 1, has been ranked as the most devastating disease for scavenging chickens in Africa and Asia. High mortality among flocks infected with velogenic strains leads to a devastating loss of dietary protein and buying power for rural households. Improving the genetic resistance of chickens to Newcastle Disease virus (NDV), in addition to vaccination, is a practical target for improvement of poultry production in low income countries. Because response to NDV has a component of genetic control, it can be influenced through selective breeding. Adding genomic information to a breeding program can increase the amount of genetic progress per generation. In this study, we challenged a commercial egg-laying line with a lentogenic strain of NDV, measured phenotypic responses, collected genotypes, and associated genotypes with phenotypes. Collected phenotypes included viral load at 2 and 6 days post-infection (dpi), antibody levels pre-challenge and 10 dpi, and growth rates pre- and post-challenge. Six suggestive QTL associated with response to NDV and/or growth were identified, including novel and known QTL confirming previously reported associations with related traits. Additionally, previous RNA-seq analysis provided support for several of the genes located in or near the identified QTL. Considering the trend of negative genetic correlation between antibody and Newcastle Disease tolerance (growth under disease) and estimates of moderate to high heritability, we provide evidence that these NDV response traits can be influenced through selective breeding. Producing chickens that perform favorably in challenging environments will ultimately increase the supply of quality protein for human consumption.

Keywords: Newcastle disease virus, GWAS, poultry, disease challenge, genetic parameters, QTL, immune response

**Abbreviations:** dpi, days post-infection; EID50, 50% embryo infectious dose; ELISA, enzyme-linked immunosorbent assay; GRM, genomic relationship matrix; Mb, mega base; NDV, Newcastle disease virus; QTL, quantitative trait loci; SE, standard error; SNP, single nucleotide polymorphism.

## INTRODUCTION

fgene-09-00326 August 16, 2018 Time: 19:15 # 2

In low income countries, chickens play a vital role in daily life. They provide important sources of high quality protein and macro and micronutrients. They are also important for livelihood and gender empowerment, as women are often the beneficiaries of poultry production, which is often not true with larger livestock (Guèye, 2000).

Newcastle disease, caused by avian paramyxovirus type 1, has been ranked as the most devastating disease for scavenging chickens (village chickens, allowed to roam with no to minimal feed provided) in Africa and Asia (Kitalyi, 1998). The more virulent strains of the virus can cause 80% mortality (number of deaths in the flock per infection event) in scavenging flocks (Kitalyi, 1998). High mortality among flocks lead to a devastating loss of dietary protein and buying power for rural households. Prevention of this disease through vaccination is challenging in rural, scavenging production systems. Difficulties arise in ensuring cold chain during transport of vaccines, inadequate vaccination programs, and high costs of administering booster vaccinations (Mayers et al., 2017). Improving the genetic resistance of chickens to NDV is a practical target for improvement of poultry production in low income countries.

Selective breeding has a demonstrated history of success in poultry production (Havenstein et al., 1994) and can be used to modulate many traits of chickens. Several reports have demonstrated genetic differences in response to NDV (Cole and Hutt, 1961; Gordon et al., 1970; Peleg et al., 1976; Soller et al., 1981; Pitcovski et al., 1987). Because response to NDV has a component of genetic control, it can be influenced through selective breeding. Adding genomic information to a breeding program can increase the amount of genetic progress per generation (Fulton, 2012). Recently, there has been a gap in the knowledge accumulation and study of NDV. However, the threat of NDV continues, as demonstrated by the 2018 outbreaks of virulent NDV in California. In this study, we challenged a commercial egg-laying line with a lentogenic (lowly virulent) strain of NDV, measured phenotypic responses, collected genotypes, and associated genotypes with phenotypes. We identified genomic regions associated with response to NDV and/or growth. A selective breeding program can be implemented, e.g., utilizing genomic information identified in this study, to produce chickens that perform favorably in challenging environments and ultimately increase the supply of quality protein for human consumption.

#### MATERIALS AND METHODS

#### Animals and Husbandry

The Iowa State University Institutional Animal Care and Use Committee approved all animal procedures and care in this study (log #1-13-7490-G). Pooled semen from 16 sires was used to inseminate 145 dams to produce 3 hatches of 200 mixed-sex chicks (N = 600) of a commercial brown egg laying line (Hy-line Brown, Hy-Line International). Birds were provided ad libitum access to feed and water throughout the study period. Initially, 23 h of light was provided, which was gradually decreased to 13.5 h of light by day 29. Temperature at chick level on day of placement was 35◦C and gradually decreased to 24◦C by day 29 and held until completion of the experiment.

#### Experimental Design

On day of hatch, chicks were transported to a biosafety level II facility at Iowa State University. For each hatch, chicks were placed into one of three rooms, using pedigree information to distribute half-sibs into different rooms. At 21 days of age (0 dpi), birds were inoculated with 10<sup>8</sup> of 50% embryonic infectious dose (EID50) of live attenuated type B1 LaSota strain NDV in a volume of 200 µL. Virus propagation was detailed previously by Deist et al. (2017b). Virus was administered via a natural, ocular-nasal route. Each eye and nares received 50 µL. Lachrymal fluid samples were collected to quantify viral load at 20, 23, and 27 days of age, hereafter designated as prechallenge, 2 dpi and 6 dpi, respectively. Blood samples were collected to measure anti-NDV antibody levels on days 20 and 31, hereafter referred to as pre-challenge and 10 dpi, respectively. Body weights were recorded on days 0, 13, 21, 27, and 31 of age. The experimental design was performed across three replicates (3 hatches from the same dams and sires). In each replicate, 180 birds were challenged, 540 in total. The objective of this study is to find genotypic associations with quantitative responses to a viral challenge. Thus, pre-challenge measurements with confirmed null viral load serve as an internal control group.

## Viral Load

To quantify viral load, viral RNA was isolated from lachrymal fluid and quantified via qPCR at three time points: pre-challenge (n = 89), 2 dpi (n = 468), and 6 dpi (n = 470) (**Table 1**). These times were chosen to detect early and maintained viral load (Gallardo, personal communication). Time points also coordinated with related studies (Deist et al., 2017a,b, 2018b; Zhang et al., 2018). Production of lachrymal fluid was induced by placing sodium chloride granules on each eye. The resulting fluid accumulation was collected with a pipette. Viral RNA was isolated from the lachrymal fluid using a MagMAX-96 viral RNA isolation kit (Life Technologies, Carlsbad, CA, United States). Isolated RNA was quantified using an LSI VetMAX NDV realtime PCR kit (Life Technologies, Carlsbad, CA, United States) targeted to the matrix protein (M) gene of NDV. Viral RNA was isolated once per sample and quantified via qPCR in duplicate. Mean viral RNA copy number was calculated per sample and log transformed. To test the difference between time points, least squares means were calculated and Student's t-test were performed in JMP (SAS Institute, Inc., Cary, NC, United States). In calculating least squares means, effects included qPCR plate, day, room nested within replicate, and sex.

#### Antibody

Anti-NDV antibody levels in sera were quantified pre-challenge (n = 453) and at 10 dpi (n = 448) using an IDEXX NDV


TABLE 1 | Descriptive statistics of phenotypes and estimates (SE) of variance components (proportions of phenotypic variance).

<sup>1</sup>Phenotypes log<sup>10</sup> transformed. <sup>2</sup>Outliers (> 3SD <) removed. <sup>3</sup>Number of phenotypic records in the association analysis. <sup>4</sup>Arithmetic mean. <sup>5</sup>Standard deviation.

ELISA for chickens (IDEXX Laboratories, Inc., Westbrook, ME, United States) (**Table 1**). This is the time needed (10 dpi) to generate an acquired immune response (production of specific antibodies). This time also coordinated with related studies (Deist et al., 2017a,b, 2018b; Zhang et al., 2018). Each sample was quantified in duplicate and the average sample:positive (S/P) absorbance ratio was calculated per manufacturer's instructions. To test the difference between time points, a standard least squares effect leverage report and Student's t-test were performed in JMP (SAS Institute, Inc., Cary, NC, United States). Effects included day, room nested within replicate, and sex. Antibody levels were also quantified, just prior to the second hatch, on dams (n = 139), which had received multiple vaccines against NDV over their lifetime, using the same assay, except plasma was used instead of serum.

#### Growth Rate

Body weights were recorded in grams on days 0, 13, 21 (0 dpi), 27 (6 dpi), and 31 (10 dpi). Pre-challenge growth rate (n = 473) was calculated as grams per day between days 0 and 21. Post-challenge growth rate (n = 470) was calculated as grams per day between days 21 and 31.

#### Genotyping

Whole blood was collected on Whatman FTA cards (Sigma-Aldrich, St. Louis, MO, United States) from all chicks pre-challenge. Genomic DNA was isolated from FTA card punches, dried, and shipped to GeneSeek, Neogen Genomics (Lincoln, NE, United States). DNA was genotyped for 600,000 SNPs using the Axiom Chicken Genotyping Array (Kranis et al., 2013) (Thermo Fisher Scientific, Inc., Waltham, MA, United States). Axiom Chicken Genotyping Array annotation files, release 35, were based on galGal genome version 5.0 (Thermo Fisher Scientific). Quality filtering of genotype data included call rate ≥95 and minor allele frequency ≥0.01. Other filtering metrics (Nclus, FLD, HomRO, HomFLD, HetSO, ConversionType, BB.varX, BB.varY, AB.varX, AB.varY, AA.varX) and requirements are listed in **Table 2**. These metrics are described in the Axiom Analysis Suite User Guide obtained from Thermo Fisher Scientific (Applied Biosystems, 2017).

#### Genetic Parameters

Variance components and heritabilities were estimated in ASReml 4 (Gilmour et al., 2015) using the following univariate animal model:

$$\mathbf{Y}\_{\rm ijk} = \boldsymbol{\mu} + \mathbf{S}\_{\rm i} + \mathbf{R}\mathbf{R}\_{\rm j} + \mathbf{A}\_{\rm k} + \mathbf{e}\_{\rm ijk} \tag{1}$$

where Y is the dependent variable of phenotype (viral load 2 and 6 dpi, antibody pre-challenge and 10 dpi, growth rate pre and post-challenge). Sex (S) and a combined variable of room and replicate (RR) were fitted as fixed effects. Random effects included animal genetic effects (A) with a genomic relationship matrix (GRM) computed from SNP genotypes following the procedure described by VanRaden (2008), and residuals (e). For viral load at 2 and 6 dpi, qPCR plate was also added as a fixed effect, and for antibody pre-challenge, antibody level of the dam was added as a covariate. The random effect of dam was included for prechallenge measurements of growth rate and antibody. Phenotypic variance was obtained by summing estimates of variance due to animal, residual, and dam (where applicable). Heritability was calculated as a ratio of the estimates of animal to phenotypic variance.

#### Association Analysis

Association analyses were performed using the R package GenABEL (Aulchenko, 2015), using a hierarchical generalized linear model (Rönnegård et al., 2010) with the same fixed effects as described for estimation of genetic parameters. The "polygenic\_hglm" function was used to fit a polygenic model, with a GRM that was created by GenABEL using the ibs() function with the weight = "no" option. The "mmscore" function, which is designed to test for association between a trait and genetic polymorphism in samples of related individuals, was used with residuals from polygenic\_hglm analysis. The mmscore function uses the formula

$$\frac{((\text{G} - \text{E[G]})\text{V}^{-1}\text{residualY})^2}{(\text{G} - \text{E[G]})\text{V}^{-1}(\text{G} - \text{E[G]})} \tag{2}$$

where G is the vector of SNP genotypes, E[G] is a vector of mean genotypic values, V−<sup>1</sup> is the inverse of variance-covariance matrix, and residualY are residuals from the trait analysis with polygenic\_hglm. Together polygenic\_hglm and mmscore function


TABLE 2 | Genotype quality metrics provided by Affymetrix and requirements that were used in quality control filtering.

<sup>1</sup>For detailed description of metrics see Axiom Analysis Suite User Guide (Applied Biosystems, 2017).

similarly to the FASTA (Family-based score test for association) method implemented by Chen and Abecasis (2007).

#### Multiple Test Correction

Genotypes were divided into chromosomes and then further divided into chromosomal segments containing a number of SNPs equal to half the number of animals as described by Waide et al. (2017). The number of independent tests was determined as the sum number of principle components that accounted for 95% of variance between genotypes for each segment (6n). The number of independent tests was used in a Bonferroni correction to determine 20% suggestive genome-wide thresholds as 0.2/6n.

#### RESULTS

#### Viral Load

Pre-challenge samples had no measurable virus copies, as expected (data not shown). Distributions of viral load 2 and 6 dpi are shown in **Figure 1**. Viral load was significantly different between 2 and 6 dpi (P < 0.0001). Viral load was greater at 2 than 6 dpi for all but 38 birds (8%) (**Figure 2**). By 6 dpi, 22 birds fell below our limit of detection for measurable viral RNA, indicative of viral clearance.

#### Antibody

Distributions of dam, chick pre-challenge, and chick 10 dpi antibody are shown in **Figure 3**. Pre-challenge anti-NDV antibody levels were measurable but significantly lower than antibody levels at 10 dpi (P < 0.0001) for all but 19 birds (4%) (**Figure 4**). These 19 birds were excluded from association analysis for both antibody time points. Antibody levels measured in dams were significantly higher than either pre-challenge or at 10 dpi in their chicks.

#### Growth Rate

**Figure 5** shows the population average growth rate pre- and postchallenge and corresponding body weight box plots. Growth rate post-challenge was significantly greater (P < 0.0001) than growth rate pre-challenge.

## Phenotypic Correlations

Viral load at 2 and 6 dpi were positively correlated (**Table 3**). Pre-challenge antibody level was negatively correlated with both pre and post-challenge growth rate. Post-challenge growth rate

FIGURE 1 | Distribution of viral load at 2 and 6 days post-infection (dpi) after log10 transformation. The bar at 0 for 6 dpi reflects the 22 individuals that had no detectable viral RNA at 6 dpi. These individuals were recorded as having 0 viral RNA copies.

FIGURE 2 | Individual data and box plots for viral load at 2 and 6 dpi. Red lines indicate birds that decreased viral load from 2 to 6 dpi. Blue lines indicate birds that exhibited increased viral load from 2 to 6 dpi. 22 birds did not have detectable virus at 6 dpi.

#### Heritabilities

Heritabilities estimated using AsReml4 were moderate (0.18 to 0.32) for viral load (**Table 1**). Estimates of heritability for preand post-challenge antibody levels were similar, 0.26 and 0.24, respectively. Estimates of heritability for Pre- and post-challenge growth rate were moderate, 0.46 and 0.21, respectively.

#### Genetic Correlations

The estimate of the genetic correlation between viral load at 2 and 6 dpi was high, 0.74 ± 0.21 (**Table 3**). Viral load at 6 dpi and antibody at 10 dpi were negatively correlated (−0.39 ± 0.33); birds with more antibodies had lower viral load. Most pathogen challenge-related traits, with the exception of viral load 6 dpi, were negatively correlated with growth rate pre- and postchallenge (−0.30 to −0.72). The two measures of growth rate had a high positive genetic correlation of 0.72. Standard errors for genetic correlation estimates were moderate, leading some estimates to not differ from 0.

was negatively correlated with viral load at 2 dpi and antibody level pre-challenge, but positively correlated with pre-challenge growth rate.

FIGURE 4 | Individual data and box plots for antibody pre-challenge and 10 dpi. Red lines and boxplots indicate animals that increase antibody levels in response to challenge (pre to 10 dpi). Blue lines and boxplots indicate animals that do not increase antibody levels in response to challenge. Dams included in blue and red boxplots produced offspring that decreased and increased antibody levels, respectively.


TABLE 3 | Estimates (SE) of phenotypic (above diagonal) and genetic (below diagonal) correlations based on bivariate analyses.

#### Alternative Phenotypes

fgene-09-00326 August 16, 2018 Time: 19:15 # 6

Several alternative phenotypes generated by combination and/or manipulation of individual phenotypes collected in this study were explored: viral load and antibody change over time (difference between time points), viral load clearance (difference between time points divided by 2 dpi level), regression of viral load and antibody measurements over time. However, none were more heritable than the individual phenotypes and most did not have heritability different from 0. Thus, they were not included further in this study.

#### Association Analysis

After quality control, 476 animals and 340,527 SNPs remained for association analysis. Principle component analysis determined that 44,364 components accounted for 95% of variance between SNPs. Using 44,364 as the number of independent tests and applying Bonferroni correction, the 20% genome-wide significance threshold was 4.508 × 10−<sup>6</sup> and used to declare suggestive associations.

Manhattan plots for viral load at 2 and 6 dpi are in **Figures 6**, **7**, respectively. One SNP on chromosome 4 was associated with viral load at 6 dpi, while none were associated with viral load 2 dpi (**Table 4**). Association analysis results are reported for antibody pre-challenge and 10 dpi, excluding the 19 birds that did not increase antibody in response to NDV challenge. Manhattan plots for antibody level pre-challenge and at 10 dpi are in **Figures 8**, **9**, respectively. Three SNPs were

associated with antibody level pre-challenge, while one SNP was associated with antibody level at 10 dpi (**Table 4**). Two SNPs were associated with growth rate pre-challenge, while none were associated with growth rate post-challenge (**Figures 10**, **11** and **Table 4**).

### DISCUSSION

#### Genetic Parameters

Heritabilities for all traits were estimated to be moderate to high, ranging from 0.18 for viral load at 6 dpi to 0.46 for growth rate pre-challenge. Our heritability estimates for antibody levels at 10 dpi are in line with those reported by (Lwelamira et al., 2009) in two Tanzanian chicken ecotypes measured just prior to and 2 weeks post-vaccination (0.27 and 0.29). Peleg et al. (1976) estimated heritability of antibody response to attenuated NDV at 12 dpi to be 0.31 based on the sire variance components. To our knowledge, ours is the first report of heritability for viral load of NDV and growth rate in layer-type birds. The moderate to high heritabilities estimated in this study indicate that all investigated traits can be influenced by selective breeding. Therefore, the means for these traits can be changed over generations.

Negative genetic correlations between pathogen response traits and growth rates indicate that selection for decreased viral load at 2 dpi and for decreased antibody levels is expected to increase pre- and post-challenge growth rate. Many studies have found immune response traits and production/growth traits under challenge to be negatively genetically correlated (Gross et al., 2002; Lwelamira et al., 2009; Hess et al., 2016). Given this information, we can speculate that higher antibody levels, which are often viewed as favorable, may be unfavorable when the desired outcome is to increase disease tolerance. Tolerance is defined as the ability of a host to limit the negative impact of infection (viral in this case) on performance (Bishop, 2012). Tolerance is a good goal for NDV in low income countries, where the virus is relatively ubiquitous and the majority of animals will be infected by the virus at some point in their life. Furthermore, it has been suggested that host tolerance places less pressure on the virus to evolve (Råberg et al., 2009). It must be recognized, however, that standard errors for genetic correlation estimates were moderate, leading some estimates to not differ from 0. A larger sample size will be needed to determine the true significance of genetic correlations.



Chromosome:base pair. <sup>2</sup>Location of positional candidate gene (bp from the SNP). <sup>3</sup>SNP is within a coding sequence but does not result in a residue change. <sup>4</sup>SNP is fixed for alternate alleles in Fayoumi and Leghorn inbred lines. <sup>5</sup>Siwek et al. (2006). Xu et al. (1998). Nassar et al. (2012). Siwek et al. (2004). Tatsuda and Fujinaka (2001). Uemoto et al. (2009).

The genetic correlation between pre- and post-challenge growth and viral load at 6 dpi was positive, which does not fit the previously mentioned negative trend between pathogen response traits and growth rates, although SE estimates were large. The resource allocation argument may provide an explanation in this case (Gross et al., 2002; Rauw, 2012). Birds that have higher viral load at 6 dpi also have higher pre- and post-challenge growth rates because they use more of their available resources to grow as opposed to clearing the virus.

#### Viral Load

The 38 birds (8%) that increased viral load from 2 to 6 dpi represent a different kinetic profile of viral clearance than the rest

of the population. Although these 38 birds exhibited a different pattern of viral clearance, there was no evidence for lack of infection or interference of response to challenge. There is no evidence that these birds were less challenged, as all 38 had measurable viral load at 2 dpi, indicating they were infected with NDV. Furthermore, none of the 38 birds were half- or full-sibs to the 19 birds that did not produce antibody in response to challenge. Viral load heritability estimates were not increased by excluding these 38 birds. Therefore, these birds were not excluded from any analyses.

No SNPs reached the suggestive threshold for viral load at 2 dpi, while one SNP reached that threshold for viral load at 6 dpi. For this SNP, located on chromosome 4 at

53 Mb, four genes were located within 1 Mb. This QTL was previously identified in association with Marek's diseaserelated traits (Xu et al., 1998). The closest gene, ANKRD50, was previously found to be down-regulated in tracheal epithelial cells of an inbred research line of Fayoumi chickens at 2 dpi with NDV compared to non-infected birds (Deist et al., 2017b). Chickens from Fayoumi and Leghorn inbred lines were used in a companion study that had the same experimental design, used the same virus, and measured the same phenotypes as the current study. Deist analyzed transcriptome responses of trachea, lung, and Harderian gland to NDV challenge (Deist et al., 2017a,b, 2018b). Zhang reported transcriptomic changes in the spleen (Zhang et al., 2018). The Fayoumi and Leghorn lines are highly inbred (Fleming et al., 2016) and their responses to various pathogens, including velogenic NDV (Lakshmanan et al., 1996; Cheeseman et al., 2007; Kim et al., 2008; Wang et al., 2014, Deist, 2018a), demonstrate the Fayoumi and Leghorn lines to represent relatively resistant and susceptible genetic research models, respectively. ANKRD50 functions in endosome to plasma membrane transport (Kvainickas et al., 2017). This is the first reported association of ANKRD50 with viral infection.

#### Antibody

It was not expected to have detectable pre-challenge antibody at 20 days of age, because many reports have demonstrated clearance of maternally transferred antibody by this age (Rose and Orlans, 1981; Liu and Higgins, 1990; Grindstaff et al., 2003; Hamal et al., 2006). However, the dams of challenged chicks were 'hyperimmunized,' as they had received 5 immunizations for NDV prior to production of the chicks used in this study. Thus, we believe that the passive maternal antibody still circulating at 20 days of age may have interfered with the response to NDV challenge, specifically in the 19 chicks that did not increase level of antibody between pre-challenge and 10 dpi. Maternal antibody interference with vaccine response is a known phenomenon (Richey and Schmittle, 1962; Eidson et al., 1976). Because these 19 chicks were likely unable to respond to the vaccine appropriately because of passive antibody interference, we conducted analyses both with and without these chicks included. Heritability of prechallenge antibody increased from 0.20 to 0.26 with exclusion of these birds. The same trend of increasing heritability was seen for antibody 10 dpi, from 0.19 to 0.24. Three suggestive QTL were found when excluding these 19 birds, while only two of the three were found when using the full dataset. These analyses provide evidence that passive antibody interference caused "noise" in the antibody response data; therefore these 19 birds were excluded from the association analysis for antibody pre-challenge and at 10 dpi. We expect that dams in low income countries would also have relatively high amounts of anti-NDV antibodies due to high environmental levels of NDV and repeated exposure to the virus.

Pre-challenge antibody did not differ significantly between the three replicates suggesting that maternal antibody transfer level did not differ significantly due to the time between the three hatches. Dam antibodies were measured from plasma, while chick antibodies were measured from serum. Previous studies have shown that antibody measured in plasma and serum are highly correlated (Cherpes et al., 2003; Siev et al., 2011), suggesting the validity of comparing levels of antibody between dams' plasma and chicks' serum in the current study.

Three SNPs, in two QTL, were suggestively associated with antibody level pre-challenge. The strongest association was on chromosome 3 at 38.2 Mb. This SNP was within the intron of B3GALNT2. B3GALNT2 was previously found to be more highly expressed in the Harderian gland of Fayoumis compared to Leghorns at 2 days post-NDV inoculation (Deist et al., 2018b). B3GALNT2 functions in protein glycosylation (Stevens et al., 2013). We present a novel association of B3GALNT2 with viral infection. One gene, GPR137B, was near the SNP, 213,694 bp upstream. GPR137B is a lysosomal integral membrane protein predicted to function in signal transduction (Gao et al., 2012). The QTL on chromosome 3 for antibody level pre-challenge, was previously associated with antibody titer to LPS antigen (Siwek et al., 2006).

The second QTL for antibody pre-challenge on chromosome 10 contained two SNPs. The strongest SNP within the chromosome 10 QTL was within the intron of the LACTB gene. This SNP was fixed for alternate alleles in the Fayoumi and Leghorn lines, evaluated by 600k Axiom Chicken Genotyping Array data from 10 birds per line. LACTB promotes intra-mitochondrial membrane organization through polymerization (Polianskyte et al., 2009). This is the first identified association of LACTB with antibody production.

The second SNP within the chromosome 10 QTL was near two genes, LINGO1 and HMG20A. LINGO1 was previously found to be down-regulated in the lung of Fayoumi chickens at 2 dpi with NDV, compared to non-infected birds (Deist et al., 2017a). LINGO1 was also less expressed in the Harderian gland of Fayoumis compared to Leghorns at 2 days after challenge with NDV (Deist et al., 2018b). When comparing the expression in the lung of non-challenged birds, Fayoumi chickens expressed more LINGO1 than Leghorns (Deist et al., 2017a). LINGO1 is a transmembrane protein functioning in signal transduction (Mi et al., 2004). HMG20A exhibited more expression in the lung of non-challenged Leghorn chickens compared to Fayoumi chickens (Deist et al., 2017a). In tracheal epithelial cells at 2 and 10 days post-NDV infection, Leghorns expressed more HMG20A than Fayoumis (Deist et al., 2017b). HMG20A has been shown to bind to viral DNA in vitro (Hsiao et al., 2006). The second antibody pre-challenge QTL on chromosome 10 was previously associated with antibody titer to LTA antigen (Siwek et al., 2006).

Antibody level at 10 dpi was associated with one SNP, located at 3.9 Mb on chromosome 21. Three genes were nearby, TARDBP, APITD1, and CASZ1. CASZ1 was downregulated 2 days post-NDV challenge in tracheal epithelial cells of Leghorn chickens compared to non-challenged birds (Deist et al., 2017b). TARDBP functions in negative regulation by host of viral transcription (GO biological process) and was previously implicated as part of the influenza-host interactome using human and mammalian cell lines in vitro (Heaton et al., 2016).

#### Growth Rate

Compared to the management guide for the Hy-Line Brown commercial layers, our birds had higher body weights across all weeks partially due to the inclusion of male chicks in our experimental population (Hy-Line International, 2016). However, the growth rate trajectories between our birds and the management guide are roughly parallel, suggesting we are not seeing a large depression due to challenge.

Growth rate pre-challenge was associated with two SNPs on chromosomes 10 and 2. The SNP on chromosome 10 was within the MAPK6 gene, which functions in phosphorylation. This QTL co-localizes with a previous association for carcass weight (Nassar et al., 2012).

The ER81 gene, near the SNP for pre-challenge growth rate on chromosome 2, functions in transcription regulation. This QTL has been previously been identified to be associated with body weight in three independent populations (Tatsuda and Fujinaka, 2001; Siwek et al., 2004; Uemoto et al., 2009).

## Support of Expression Studies for Suggestive SNP Associations

Incorporating previous gene expression data can improve the value of GWAS data, especially when significant expression data coincides with suggestive (near-significant) SNPs (Cheng et al., 2013). The SNP on chromosome 5 with the lowest p-value of association with viral load at 2 dpi (**Figure 6**), is within a gene (PAMR1) that was previously found to be differentially expressed in tracheal epithelial cells at 2 and 6 days post-NDV infection (Deist et al., 2017b). At both time points, Leghorn chickens expressed higher levels of PAMR1 compared to Fayoumis. The Leghorn chickens in the referenced study had significantly more viral genome transcripts in the trachea at 2 dpi. Perhaps the difference in viral load between the two lines is due in part to expression differences in this gene and provide support for the near significant GWAS results.

We identified a suggestive QTL on chromosome 4 at 53 Mb for viral load at 6 dpi (**Figure 7**). Several SNPs in the location of the QTL fell just below the threshold. One of these SNPs (p-value of association 5.22 × 10−<sup>5</sup> ) is within the ADAMTS3 gene. In samples from the birds utilized in this GWAS study, the ADAMTS3 gene was shown to be down-regulated in the spleen 6 days after NDV challenge (Zhang et al., 2018). Integration of this information provides further evidence for the existence of the identified QTL for viral load at 6 dpi on chromosome 4.

The OFD1 gene encompasses three SNPs within the nearsignificant QTL on chromosome 1 for growth rate postchallenge (**Figure 11**). OFD1 functions in primary cilium organization and assembly (Ferrante et al., 2006). This gene was shown to exhibit lower expression in the Harderian gland of Leghorns compared to Fayoumis, 2 days after NDV challenge (Deist et al., 2018b). Perhaps the differential expression of OFD1 contributes to the susceptible/resistant phenotypes of the Leghorn/Fayoumi lines. Overall, OFD1 may play a role in NDV tolerance – performance (growth) under challenge.

## CONCLUSION

fgene-09-00326 August 16, 2018 Time: 19:15 # 10

Six suggestive QTL associated with response to NDV and/or growth were identified. Some were novel and others confirmed previously reported associations with related traits. Additionally, previous RNA-seq analysis provided support for several of the genes located in or near the QTL of the current study. Considering the trend of negative genetic correlation between antibody and Newcastle Disease tolerance (growth under disease) and estimates of moderate to high heritability, we provide evidence that these NDV response traits can be influenced through selective breeding. This information can inform breeding decisions for the production of chickens that will be raised in NDV endemic areas once more knowledge of the relationship of antibody and viral load with mortality is obtained. Producing chickens that perform favorably in challenging environments will ultimately increase the supply of quality protein for human consumption.

#### AVAILABILITY OF DATA AND MATERIALS

The data that support the findings of this study are available from Hy-Line International but restrictions apply to the availability of these data, which were used under license for the current study, and thus are not publicly available. However, data are available from the authors upon reasonable request to SL and with permission of Hy-Line International.

#### REFERENCES


## AUTHOR CONTRIBUTIONS

HZ, RG, JD, and SL designed the study. KR, AW, JD, and SL performed the animal experiments and collected phenotypes. KR performed the lab work related to quantifying phenotypes and DNA extraction. KR performed the data analysis with inputs from AW, JD, HZ, and SL. KR wrote the initial draft of the manuscript. KR, AW, RG, TK, HZ, JD, and SL provided critical revision. All authors read and approved the final manuscript.

#### FUNDING

KR was supported by a USDA National Needs Fellowship (2013- 38420-20496). This work was supported by USAID Feed the Future Innovation Lab for the Genomics to Improve Poultry and Hatch project #5357. This study is made possible by the generous support of the American people through USAID. The contents are the responsibility of the Feed the Future Innovation Lab for Genomics to Improve Poultry and do not necessarily reflect the views of USAID or the United States Government.

## ACKNOWLEDGMENTS

We thank Hy-Line International for providing the birds used in this experiment as well as the Lamont lab members, especially Michael Kaiser, for their help in organization and sample collection.



Zhang, J., Kaiser, M. G., Deist, M. S., Gallardo, R. A., David, A. B., Kelly, T. R., et al. (2018). Transcriptome analysis in spleen reveals differential regulation of response to Newcastle disease virus in two chicken lines. Sci. Rep. 8, 1–13. doi: 10.1038/s41598-018-19754-8

**Conflict of Interest Statement:** Hy-Line International made the in-kind contribution of the animals studied. AW is employed by Hy-Line International and Iowa State University. AW helped with the animal experiments and phenotype collections and provided input on data analysis. Hy-Line International is interested in the outcome of this experiment in regards to their commercial product, but this had no influence on the outcomes of the experiment or this manuscript.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Rowland, Wolc, Gallardo, Kelly, Zhou, Dekkers and Lamont. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Overexpression of Chicken IRF7 Increased Viral Replication and Programmed Cell Death to the Avian Influenza Virus Infection Through TGF-Beta/FoxO Signaling Axis in DF-1

Tae Hyun Kim1,2 and Huaijun Zhou1,2 \*

<sup>1</sup> Department of Animal Science, University of California, Davis, Davis, CA, United States, <sup>2</sup> Integrative Genetics and Genomics Graduate Group, University of California, Davis, Davis, CA, United States

During mammalian viral infections, interferon regulatory factor 7 (IRF7) partners with IRF3 to regulate the type I interferon response. In chickens, however, it is still unclear how IRF7 functions in the host innate immune response, especially given that IRF3 is absent. To further elucidate the functional role of chicken IRF7 during avian influenza virus (AIV) infection, we generated inducible IRF7 overexpression DF-1 cell lines and performed in vitro infection using low pathogenic AIVs (LPAIVs). Overexpression of IRF7 resulted in higher viral replication of H6N2 and H10N7 LPAIVs compared to empty vector control cells regardless of IRF7 expression level. In addition, a high rate of induced cell death was observed due to elevated level of IRF7 upon viral infection. RNA-seq and subsequent transcriptome analysis of IRF7 overexpression and control cells discovered candidate genes possibly controlled by chicken IRF7. Functional annotation revealed potential pathways modulated by IRF7 such as TGF-beta signaling pathway, FoxO signaling pathway and cell structural integrity related pathways. Next, we analyzed the host response alteration due to the IRF7 overexpression and additionally discovered the possible connection of chicken IRF7 and JAK-STAT signaling pathway. These findings suggest that chicken IRF7 could modulate a wide range of cellular processes in the host innate immune response thus meticulous control of IRF7 expression is crucial to the host in response to AIV infection.

Keywords: AIV, avian influenza, chicken, DF-1 cell line, IRF7, overexpression, RNA-seq

## INTRODUCTION

Avian influenza virus (AIV) is one of the major pathogens that significantly impacts the poultry industry worldwide (Olsen et al., 2006; Swayne, 2012). For example, recent high pathogenic avian influenza (HPAI) outbreaks between late-2014 to mid-2015 in the United States resulted in the death of more than 50 million birds, an estimated 12% of the layer chickens and eight percent of the meat turkeys raised in the United States that year (Jhung and Nelson, 2015; Ramos et al., 2017). Declined production and HPAI-related trade restrictions further contributed to the significant economic loss to the industry (Ramos et al., 2017). Current strategies for controlling AIV primarily rely on passive measures such as quarantine and slaughter, partially due to our

#### Edited by:

Mark S. Fife, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Irit Davidson, Kimron Veterinary Institute, Israel Sascha Trapp, INRA Centre Val de Loire, France

> \*Correspondence: Huaijun Zhou hzhou@ucdavis.edu

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 30 May 2018 Accepted: 06 September 2018 Published: 25 September 2018

#### Citation:

Kim TH and Zhou H (2018) Overexpression of Chicken IRF7 Increased Viral Replication and Programmed Cell Death to the Avian Influenza Virus Infection Through TGF-Beta/FoxO Signaling Axis in DF-1. Front. Genet. 9:415. doi: 10.3389/fgene.2018.00415

**47**

limited understanding of the chicken antiviral response compared to mammals (Goossens et al., 2013). A better understanding of the host antiviral response in chickens could provide critical information to develop improved prevention strategies as well as novel therapeutics against AIV (Downing et al., 2009; Magor et al., 2013).

Interferons (IFNs) are known to trigger host innate immune responses against viral infection by activating signal transduction pathways (Der et al., 1998; de Weerd et al., 2007), and currently more than 3,800 IFN regulated genes have been reported according to the Interferome database (Rusinova et al., 2013). IRF7 is well known as the master transcription factor that interacts with IRF3 to initiate the type I IFN response in mammals (Honda et al., 2005; Honda and Taniguchi, 2006). Even though chickens can induce type I IFNs robustly in response to viral infection (Der et al., 1998), avian species lack IRF3, and the precise molecular and cellular mechanisms of IFN regulation remain to be elucidated (Huang et al., 2010; Santhakumar et al., 2017). Based on this, we hypothesized that the chicken IRF7 may have a conserved function in regulating antiviral response in chickens, yet the signaling cascade and mechanism of action could be species-specific.

In our recent analysis of stable overexpression and knockdown of IRF7 in chicken DF-1 cell lines followed by mimicking viral infection with dsRNA analog poly(I:C), we demonstrated that the primary function of IRF7 as type I IFN regulator may be conserved (Kim and Zhou, 2015). Constitutive overexpression of IRF7 resulted in upregulation of IFNB upon poly(I:C) induction whereas IRF7 knockdown caused downregulation of IFNA (Kim and Zhou, 2015). Further transcriptome analysis revealed more than 60 novel candidate genes that are potentially regulated by IRF7, suggesting a distinct function of chicken IRF7 (Kim and Zhou, 2015). Another study demonstrated that the knockdown of IRF7 by siRNA limited IFNA, IFNB, and STAT1 mRNA expression and increased Newcastle disease virus replication in chicken embryonic fibroblasts (CEFs), suggesting the functional role of IRF7 as a type I IFN regulator (Wang Y. et al., 2014).

To further elucidate the functional role of chicken IRF7 in the context of AIV infection, we took advantage of the inducible expression system to control the expression level of IRF7 in DF-1 cells and infected the established cell lines with two low pathogenic AIV (LPAIV) strains. Correlation between the IRF7 expression level and the AIV replication phenotype was investigated with different levels of IRF7 induction. In addition, we analyzed the transcriptome of IRF7 overexpression and control cells by RNA-seq after LPAIV or mock infection to examine candidate genes and pathways that are potentially modulated by IRF7 upon AIV infection.

## MATERIALS AND METHODS

#### Expression Plasmid Construction

Chicken IRF7 coding sequence (CDS, KP\_096419) was cloned into the piggyBac(pB) cumate expression inducible plasmid (System Biosciences, Mountain View, CA, United States) which controls the expression level by cumate gene switch (pB-CuO-IRF7). The inducible vector co-expresses the repressor, CymR and puromycin resistance gene driven by consecutive EF1α promoter (**Figure 1A**). Consistently expressed CymR binds to the CuO promoter to repress gene expression and addition of cumate changes the conformation of CymR which then turns on the gene switch by removing the repressor from the promoter.

## Inducible IRF7 Overexpression Cell Line Establishment

Immortalized chicken embryonic fibroblast DF-1 cells (ATCC, Manassas, VA, United States) were cultured in Dulbecco's modified Eagle's medium (Thermo Fisher Scientific, Waltham, MA, United States) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, Waltham, MA, United States), 1x Antibiotic-antimycotic (Thermo Fisher Scientific, Waltham, MA, United States), and incubated at 37◦C in a humidified atmosphere containing 5% CO2. Empty vector (Control) or chicken IRF7 inducible expression vector (pB-CuO-IRF7) was co-transfected with pB transposase plasmid into DF-1 cell lines using the Lipofectamine 3000 reagent (Thermo Fisher Scientific, Waltham, MA, United States) according to the manufacturer's protocol for efficient integration. Puromycin (3 µg/ml) was added to the culture media 48 h after transfection and stably integrated cell lines were selected for 2 weeks. To induce IRF7 expression, cumate (4-Isopropylbenzoic acid, Sigma-Aldrich, St. Louis, MO, United States) was added to the culture media at 12 h after seeding for 24 h followed by subsequent in vitro experiments.

#### Quantitative Reverse Transcriptase PCR

Total RNA was isolated from approximately 1 million cells using Direct-zol RNA MiniPrep Kit (Zymo Research, Irvine, CA, United States) and complement DNA (cDNA) was synthesized from total RNA (500 ng) using Verso cDNA Synthesis Kit (Thermo Fisher Scientific, Waltham, MA, United States). Quantitative reverse transcriptase PCR (qRT-PCR) was performed using the Applied Biosystems 7500 Fast Real-Time PCR System (Life Technologies, Grand Island, NY, United States) with SYBR Select Master Mix (Life Technologies, Grand Island, NY, United States). IRF7, IFNA, IFNB expression was normalized to the chicken glyceraldehyde 3-phosphaste dehydrogenase (GAPDH) gene using the 11 C<sup>T</sup> method (Livak and Schmittgen, 2001; Kim and Zhou, 2015).

#### Virus and in vitro AIV Infection

A/Chicken/California/2000 (H6N2) and A/Chicken/ California/1999 (H10N7) low pathogenic avian influenza virus (LPAIV) strains were kindly provided by Dr. Rodrigo Gallardo (University of California, Davis, CA, United States) and Dr. Peter Woolcock [University of California, Davis, California Animal Health and Food Safety (CAHFS)], respectively. Each LPAIV was propagated in Madin-Darby Canine Kidney (MDCK) cells as described in Eisfeld et al. (2014). All in vitro AIV infections were performed using CellBIND 12 well tissue culture plate (Corning, NY, United States ) with 1 × 10<sup>6</sup> cells per well at the seeding. For viral replication kinetics phenotype, established DF-1 cell lines were induced and infected with either

H6N2 or H10N7 at multiplicity of infection (MOI) of 0.01 with 0.05 µg/ml TPCK-trypsin in DMEM. Culture supernatants were collected at 0, 12, and 24 h post-infection (hpi) and the viral titer of each sample was measured by the plaque assay using MDCK cells (Huprikar and Rabinowitz, 1980). For the transcriptome profiling study, non-induced cell lines were infected with either mock or H6N2 at an MOI of 1 with 0.05 µg/ml TPCK-trypsin in DMEM. Trizol reagent (Thermo Fisher Scientific, Waltham, MA, United States) was directly added to the cell monolayer after

libraries. Replicates from each experimental group is separated by IRF7 expression level and H6N2 infection.

washed twice with PBS to extract the total RNA at 6 hpi which the IRF7 expression level starts to peak upon AIV infection (Kim and Zhou, 2015).

## RNA Sequencing and Data Analysis

A total of eight cDNA libraries were prepared from two biological replicates of each group (mock or H6N2 infected Control or CuO-IRF7). RNA sequencing libraries were prepared from poly-adenylated RNA and sequenced with Illumina HiSeq4000

which generated over 20 million 150 bp paired-end reads per sample. The read files from RNA-seq analysis have been deposited in NCBI's Gene Expression Omnibus with GEO Series accession number GSE115131. We checked the quality of each library by fastQC (version 0.11.6) and trimmed the adaptor sequence by TrimGalore (version 0.4.5). We aligned the trimmed fastq files to the galGal5 chicken genome using STAR aligner (version 2.6.0) with NCBI annotation release 103 (Dobin et al., 2013). Unmapped reads were aligned against H6N2 (A/chicken/CA/6643/2001) genome (Webby et al., 2002). Raw read counts were extracted by HTSeq (version 0.9.0) from each aligned bam files and used to identify differentially expressed genes (DEGs) (Anders et al., 2015). Both DESeq2 and EdgeR R packages were used to identify DEGs and the DEG sets from the both packages were combined (false discovery rate (FDR) < 0.1% in any one of the packages) (Robinson et al., 2010; McCarthy et al., 2012; Love et al., 2014). The combined DEG lists were further filtered by removing the low expression genes if both samples of given comparison had fragments per kilobase per million reads (FPKM) value less than 1. Functional annotations for significantly DEGs were performed using DAVID 6.8 (Dennis et al., 2003; Huang da et al., 2009). The enriched gene ontology (GO) terms on biological processes and the pathways obtained from DAVID functional analysis were filtered for significance by gene count ≥ 5 and p-value < 0.05.

## Hoechst 33342/Propidium Iodide Staining

Each control or CuO-IRF7 cells were seeded at 1 × 10<sup>5</sup> cells/well to attachment factor protein (Thermo Fisher Scientific, Waltham, MA, United States) treated µ-Plate 96 Well glass bottom plate (ibidi GmbH, Germany) before 24 h of infection and infected 1MOI with three independent replicates. At 6 and 12 hpi, each well was washed twice with PBS and stained with 1 µg/ml each of Hoechst 33342 (Thermo Fisher Scientific, Waltham, MA, United States) and propidium iodide (Thermo Fisher Scientific, Waltham, MA, United States) in PBS at room temperature for 10 min. Two images were taken from each replicate by using 20× objective lens and captured by Nikon NIS 3.0 software. A total number of cells (blue) and dead cells (red) were counted using ImageJ (Wayne Rasband, National Institutes of Health, United States) software from all images.

## RESULTS

### Inducible Over-Expression of IRF7 in DF-1 Cell Line

To precisely control the expression level of IRF7 in vitro, stable IRF7 inducible overexpression (CuO-IRF7) DF-1 cells were established using a cumate inducible vector (**Figure 1A**). Empty vector control cells (Control) which have identical expression cassettes without IRF7 CDS were also generated to exclude potential random gene disruption noise due to vector integration. We induced established cell lines with various cumate concentrations from 0 to 100 µg/ml for 24 h to titer the expression level of IRF7 by qRT-PCR (**Figure 1B**). There was approximately 10-fold higher IRF7 expression in CuO-IRF7 cells compared to the control cells without induction (**Supplementary Figure S1**). Titratable IRF7 expression was observed in CuO-IRF7 cells as cumate concentration increases (up to 15-fold upregulation vs. non-induced cells) and the induction levels of IRF7 were highly reproducible. In contrast, IRF7 expression level was not affected by cumate induction in control cells.

## LPAIV Virus Replication in the Induced Cell Lines

Two LPAI virus strains H6N2 and H10N7 at an MOI of 0.01 were used to infect the cumate induced cell lines, and the correlation between IRF7 expression level and the AIV replication was analyzed. Both CuO-IRF7 and control cell lines were induced with 0, 20, or 40 µg/ml of cumate for 24 h which correspond to approximately 10, 50, and 100-fold overexpression of IRF7 in CuO-IRF7 cells compared to the control cells at the time of infection. Then, AIV replication dynamics were analyzed at 12 and 24 h post-infection (hpi) by MDCK based plaque assay (**Figure 1C**). At 12 hpi, there was no significant difference in viral replication observed between IRF7 overexpression and control cells for both LPAI virus strains at any level of induction. At 24 hpi, we observed significantly increased viral replication in IRF7 overexpressed cell lines compared to the control cell lines from the both virus strains with a range of 1.6-fold (H6N2, 20 µg/ml) up to fivefold (H10N7, 0 µg/ml) (**Figure 1D**). However, we did not observe any significant correlation between the induced IRF7 expression level and the viral titer in the overexpression cells.

## Type I IFN Regulation by IRF7

We additionally measured the expression levels of type I IFNs in mock or H6N2 infected (1 MOI) control and overexpression cells to examine the type I IFN regulator role of IRF7 (**Figure 1E**). In mock infected cell lines overexpression of IRF7 resulted almost threefold upregulation of IFNB whereas IFNA level did not show significant upregulation. There was no significant difference in expression of IFNA and IFNB between the Control and Cuo-IRF7 cells upon infection despite upregulation of IRF7 upon infection in each cell line (Control: 1.44-fold, FDR = 0.039; Cuo-IRF7: 1.31 fold, FDR = 0.098, DESeq2).

## Transcriptome Analysis

To identify the candidate genes and signaling pathways that IRF7 may regulate during AIV infection, we performed RNAseq analysis on four different experimental groups of cells. There was no significant difference in viral replication among different IRF7 overexpression levels, thus non-induced DF-1 cells with either mock or H6N2 infection treatments were used and harvested at 6 hpi (**Figure 1C**). Each condition had 2 independent biological replicates and principal component analysis showed that the replicates in each condition grouped together by either IRF7 expression level or H6N2 infection condition (**Figure 1F**). The alignment rate against galGal5 reference genome was on

average of 84.98% from mock infection libraries and decreased to 71.57% with H6N2 infection as approximately 15% of the total reads were mapped to the viral genome in H6N2 infected cell lines.

from (E) mock infection and (F) H6N2 infection condition. Number of genes enriched in each biological process is in parentheses.

## Differential Expression Analysis of IRF7 Overexpression

First, by directly contrasting the transcriptomes between the control and IRF7 overexpression cell lines, we were able to identify the DEGs that were potentially regulated by IRF7 either at basal condition or upon H6N2 infection. There were 1,002 DEGs (465 up-, 537 down- regulated) in mock condition and 804 DEGs (408 up-, and 396 down-regulated) in H6N2 infection (**Figures 2A,B** and **Supplementary Table S1**). Comparison of the two DEG lists showed 470 genes were overlapped between mock and infection conditions (**Figure 2C**). The gene expression heatmap showed distinct gene expression pattern difference between control and CuO-IRF7 cells among the common 470 DEGs (**Figure 2D**). Gene ontology (GO) analysis from all DEGs revealed enriched functions involved in cell structural integrity or cellular assembly such as cell adhesion, extracellular matrix (ECM) organization, integrin signaling mediated processes, and apoptosis (**Figures 2E,F**). Furthermore, we performed pathway analysis with the same gene sets to examine which pathways could be potentially altered as a result of IRF7 overexpression. Antibiotic and steroid biosynthesis pathways were significantly enriched in the mock condition while arginine, proline, and pyrimidine metabolisms were significantly enriched upon infection (**Figures 3A,B**). In addition, cell structure and ECM related pathways were enriched in both conditions (**Figures 3A,B**). Pathway analysis also revealed that FoxO signaling, TGF-beta signaling and PPAR signaling pathways, which are related to the immuno-regulation, could be possibly regulated by IRF7 (**Figure 3A**). DEGs in FoxO signaling and TGF-beta signaling pathways are presented in **Supplementary Figure S2**. Heatmaps with fold changes of all individual DEGs from significantly enriched pathways are presented in **Figures 3C–H**). Most of the integrin (ITG), laminin (LAM), and collagen (COL) DEGs were up-regulated in both conditions except for ITGA9, LAMB1, and COL6A2, which were down-regulated DEGs (**Figure 3D**). TGFB1 and

SMAD3 were up-regulated whereas PTEN and MYC were down-regulated in the TGF-beta signaling pathway in both conditions (**Figure 3E**).

### IRF7 Overexpression Increased Cell Death Upon AIV Infection

Based on the enriched GO terms and signaling pathways, we further investigated the effect of IRF7 overexpression on cell viability using Hoechst 33342/PI nucleic acid staining fluorescence microscopy. While Hoechst dyes can penetrate the living cells, propidium iodide only stains dead cells. Representative images of Hoechst 33342/PI staining are shown in **Figure 4A** (6 hpi) and **Figure 4B** (12 hpi). Approximately twice as many CuO-IRF7 cells (48.4 ± 5.3%) died upon H6N2 infection at 6dpi as control cells (24.2 ± 4.7%) at 6 hpi (**Figure 4C**).

#### Role of IRF7 in the Host Response Against H6N2 Infection

Next, we compared the host responses against H6N2 infection between the cell lines (Control/Mock vs. Control/H6N2 and Cuo-IRF7/Mock vs. Cuo-IRF7/ H6N2) to investigate which genes and pathways were potentially modulated by IRF7 upon infection (**Supplementary Table S2**). We identified 564 activated genes and 886 repressed genes in control cells upon H6N2 infection (**Figure 5A**) and a similar number of DEGs were identified from overexpression cells (**Figure 5B**; 557 activated genes, 735 repressed genes). There were 704 genes that were common between the two contrasts, and 746 and 588 genes were unique DEGs to the control and Cuo-IRF7 contrast, respectively (**Figure 5C**). GO and pathway analysis using DEGs from the control contrast represented the molecular signature of host response against AIV infection (**Figures 5D,E**) while the Cuo-IRF7 contrast could identify unique genes and signaling pathways associated with AIV infection that were potentially regulated by IRF7 (**Figures 5F,G**). Yet, despite the difference in functional annotation observed between the two contrasts, the individual genes did not show dramatic fold change differences between them (**Supplementary Figure S3**).

Then, we further analyzed the interaction of IRF7 overexpression by H6N2 infection to identify the genes that were differentially regulated during the infection by IRF7 (**Supplementary Table S3**). The top 50 interaction genes from a total of 350 DEGs (FDR < 5%) are listed in **Figure 6A**. GO analysis (**Figure 6B**) demonstrated that IRF7 overexpression could affect genes in cellular assembly, organization and structural functions as well as apoptosis. Pathway analysis (**Figure 6C**) also showed consistency of the candidate pathways enriched from previous analyses shown in **Figures 3A,B**, **5F,G**. Of particular note, Janus kinase/signal transducers and activators of the transcription (JAK-STAT) signaling pathway were significantly represented in this analysis which was not enriched in any of the above direct contrast functional annotations (**Figure 6D**). In addition, the TGF-beta signaling pathway was also significantly enriched (**Figure 6E**).

## DISCUSSION

Here, we employed a functional genomics approach to investigate the functional role of chicken IRF7 in the host innate immune response to AIV infection by generating an in vitro inducible overexpression model followed by whole transcriptome sequencing. Overexpression of IRF7 resulted in higher viral replication as well as greater cell death in our in vitro model. The transcriptome analysis suggested that chicken IRF7 might be involved in modulating a wide range of cellular processes including programmed cell death via the TGF-beta, FoxO, and JAK-STAT signaling pathways.

In this study, a cumate inducible system was applied to fine control the overexpression of IRF7 (a 10-fold change by qRT-PCR, and a twofold change by RNA-seq) and its range may better reflect actual physiological expression of IRF7 (**Figure 1B**) compared to the constitutive overexpression system we developed in our previous study that resulted in almost 200-fold overexpression (Kim and Zhou, 2015). Even with this substantially lower IRF7 overexpression level compared to the previous study, we were able to observe the significant IFNB upregulation (**Figure 1E**). This further supports a regulatory role of IRF7 on type I IFNs in chickens (Kim and Zhou, 2015). Furthermore, the relatively lower level of upregulation or repression patterns of IFNs upon H6N2 infection despite upregulation of IRF7 could suggest that manipulation of the host immune system by AIV affects the link between the IRF7 and type I IFNs (Alcami and Koszinowski, 2000).

Interestingly, IRF7 overexpression resulted in significantly higher viral titer in the overexpression cells than control cells regardless of IRF7 induction level (**Figure 1D**). However, it is yet to be determined whether the higher progeny viral titer resulting from IRF7 overexpression is detrimental or beneficial to the host as we also observed increased levels of induced cell death (**Figure 4**). Programmed cell death such as apoptosis and necroptosis are host defense strategies known to limit viral infection by eliminating the environment for viruses to replicate (Upton and Chan, 2014; Orzalli and Kagan, 2017), while influenza viruses adapted to inhibit the apoptosis process to benefit their survival through modulating the host response (Zhirnov et al., 2002; Ehrhardt et al., 2007). On the other hand, influenza virus can induce immense host cell death for effective replication and transmission that results in morbidity, pathogenesis and virulence (Brydon et al., 2005; Ludwig et al., 2006). Further in vivo investigation could test the hypothesis that IRF7 overexpression was beneficial to the host by promoting the programmed cell death to limit the virus.

Pathway analysis suggested both the TGF-beta and FoxO signaling pathways as potential mechanisms that could be modulated by chicken IRF7 in the host response to AIV. The TGF-beta signaling pathway has diverse functions in cells and tissues, including cell-cycle control, differentiation, extracellular matrix formation, and apoptotic activation (Massague and Chen, 2000; Schuster and Krieglstein, 2002). The FoxO signaling pathway also plays important roles in metabolism, stress resistance, cellular proliferation, and apoptosis by transcription

FIGURE 3 | Pathway analysis of DEGs and their expression patterns. Pathway analysis enrichment using DEGs between Control and CuO-IRF7 cells in (A) mock infection and (B) H6N2 infection conditions. Number of genes in each pathway is in parentheses. (C–H) Heatmaps showing the expression fold change of DEGs in significantly enriched pathways in mock infection (left column) and H6N2 infection (right column). Pathways were combined based on the common genes across the pathways. Lists of DEGs from both conditions in each pathway were combined to generate each heatmap.

factor family Forkhead box (FOX) proteins. TGF-beta and FoxO signaling pathways are often considered together as their mechanisms of action are closely associated (Naka et al., 2010; Zhang et al., 2011; Wang Z. et al., 2014). TGFbeta induces apoptosis by SMAD-dependent manner (Jang et al., 2002; Schuster and Krieglstein, 2002), and we observed upregulation of TGFB1 and SMAD3 and downregulation of anti-apoptotic gene BCL2 by IRF7 overexpression. This reflects the increased apoptosis and implies IRF7 regulation of the process (**Figures 2A,B**, **3E** and **Supplementary Figure S2**). Qing et al. (2004) suggested the regulation of IRF7 function by TGFbeta/Smad3 signaling as a possible mechanism of the host type I

IFN response in the mouse embryonic fibroblasts, yet our results suggest a possible regulation of TGF-beta/Smad3 signaling by chicken IRF7.

FOXO3 was shown as a negative regulator of IRF7 gene transcription in the mouse macrophages in which FOXO3 directly binds at the IRF7 promoter and could control its transcription (Litvak et al., 2012). Another study suggested a negative regulatory role of FOXO1 in the cellular antiviral response by promoting the ubiquitination of IRF3 and subsequent IRF3 protein degradation (Lei et al., 2013).

(E) TGF-beta signaling pathway.

Both studies suggest the possible regulatory circuit that some FOXO proteins controlling IRF3 or IRF7 to prevent excessive innate immune response that could result pathological outcome. Differential expression of FOXO3, FOXO4 and FOXO6 as a result of the IRF7 overexpression (**Figure 3E** and **Supplementary Figure S2A**) may suggest possible conservation of the feedback circuits controlling the antiviral response associated with FOXO transcription factors in chickens.

The death receptor (DR) signaling pathway is another arm of the extrinsic apoptotic process (Orzalli and Kagan, 2017). There have been reports regarding the detrimental effects of IFNs in influenza virus infection in which excessive IFN levels lead to severe damage on the host (McNab et al., 2015). Influenza virus susceptible mouse strains were found to have a stronger and more sustained type I IFN signal than resistant strains and antagonizing the type I IFN signal in susceptible strains improved host survival and reduced inflammation (Davidson et al., 2014). It was suggested that the disease-promoting effects of IFN are possibly mediated by upregulation of apoptosisinducing proteins such as TNF-related apoptosis-inducing ligand

(TRAIL) and its receptor DR5 (TNFRSF10B) or Fas cell surface death receptor (FAS) which could lead to the tissue damage in somatic cells and immunosuppression on immune cells (Fujikura et al., 2013; Hogner et al., 2013; Davidson et al., 2014). In our study, TRAIL was not differentially expressed and downregulation of DR5 and FAS (**Figures 2A,B**) were observed due to IRF7 overexpression, which suggests that the IRF7 mediated apoptosis may not utilize the DEATH receptor mediated mechanism.

Necroptosis is now recognized as an alternative to apoptosis as a mechanism of controlled cell death Necroptosis has a distinct regulation mechanism compared to unintentional cell death by necrosis and it is a better inducer of a strong proinflammatory response that is crucial to the host immune response when compared to the apoptosis (Mocarski et al., 2015; Orzalli and Kagan, 2017). A protective role of Receptor Interacting Serine/Threonine Kinase 3 (RIPK3) has been reported against viral infections including influenza A virus and RIPK3 is known as a key factor upon viral infection that determines whether the infected cells undergo apoptosis or necroptosis (Newton et al., 2014; Nogusa et al., 2016; Thapa et al., 2016; Daniels et al., 2017). In this study, RIPK3 was also up-regulated by IRF7 overexpression and along with the increased cell death, this might suggest a novel functional role of chicken IRF7 as a potential regulator in the necroptosis pathway against AIV infection.

In addition, chicken IRF7 may regulate genes involved in the cell structural integrity or cellular assembly such as cell adhesion, ECM organization, and adherens junctions (**Figures 2E,F**, **3A,B**) which have a wide range of functions in the host response to viral infection. Actin cytoskeleton plays an important role in the entry of influenza virus into cells and proper assembly of viral particles (Sun and Whittaker, 2013; Kumakura et al., 2015). ECMs are also critical across many stages of the viral life cycle, including viral entry, transmission, and exit (Stavolone and Lionetti, 2017). Integrin mediated cell adhesion to ECM is essential for survival of many cell types (Meredith and Schwartz, 1997), and apoptotic cells undergo distinct morphological changes characterized by cell and nucleus shrinkage as well as disassembly into apoptotic bodies which are associated with structural proteins such as actins and laminins (Saraste and Pulkki, 2000; Suzanne and Steller, 2009). Actin initiates and mediates mammalian apoptosis via the intrinsic and extrinsic pathways and final degradation of actin filaments amplifies the apoptosis signaling cascade. Differential regulation of these structural proteins (mostly upregulated by IRF7 overexpression, **Figure 3D**) may contribute to the increased apoptosis and viral production yet the precise mechanism remains to be further elucidated.

Furthermore, interaction analysis of IRF7 overexpression and H6N2 infection was performed to discover the genes that were differentially regulated during the infection due to IRF7 overexpression. The analysis not only reinforced our findings from the direct contrast analyses (**Figures 2**, **3**), which suggested the modulator role of IRF7 in a wide range of host responses to H6N2 infection, but also discovered additional genes and pathways that were possibly modulated by IRF7. For example, the JAK-STAT signaling pathway was not enriched in any of direct contrast analyses but was significantly enriched from the interaction DEGs. The JAK-STAT pathway is one of the key pathways in the type I IFN response and employs interferonstimulated genes to inhibit virus infection by targeting the viral life cycle and regulate the host processes (Schneider et al., 2014; Majoros et al., 2017). In our study, Tyrosine kinase 2 (TYK2, one of the JAKs) and its negative regulators SOCS1, SOCS3 as well as anti-apoptosis genes BCL-XL and PIM1 were differentially regulated by IRF7 during the H6N2 infection.

In sum, overexpression of IRF7 resulted in higher viral replication as well as increased cell death in DF-1 cell lines. Although it is unclear in our in vitro model if increased viral replication due to IRF7 overexpression is beneficial to the host or to the virus, our results suggest potential modulator function of chicken IRF7 in the programmed cell death via TGF-beta-FOXO signaling axis in the host response. In addition, we revealed sets of candidate genes that IRF7 might regulate in the cellular structure organization, highlighting cell-cell adhesion processes that play an important role in both host response and viral life cycle. Chicken IRF7 was also involved in metabolic pathways that have known functions in antiviral response. An ongoing complete loss-of-function study of IRF7 by CRISPR-Cas9 will expand our knowledge on potential regulatory role of IRF7-dependent and -independent pathways on AIV infection in poultry. A genomewide IRF7 binding study by ChIP-seq would also help to expand our knowledge by determining whether the DEGs were directly modulated by IRF7 binding or by the cascade of downstream type I IFN responses.

## AUTHOR CONTRIBUTIONS

TK and HZ conceived and designed the experiments, contributed reagents, materials, analysis tools, analyzed the data, and wrote the paper. TK performed the experiments.

## FUNDING

This work was supported by United States Department of Agriculture (USDA), National Institute of Food and Agriculture (NIFA), Multistate Research Project NRSP8 and NC1170 (HZ), and the California Agricultural Experimental Station (HZ). TK was partially supported by Fellowships Grant Program Award #2017-67011-26762 from the USDA NIFA.

## ACKNOWLEDGMENTS

Two LPAIV strains used in this study were gifts from the labs of Dr. Rodrigo Gallardo and Dr. Peter Woolcock. Ms. Ganrea Chanthavixay and Ms. Karen Tracy provided great inputs on revision of the manuscript.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018. 00415/full#supplementary-material

#### REFERENCES

fgene-09-00415 September 21, 2018 Time: 14:45 # 11


transcriptional activation of the beta interferon promoter. Mol. Cell. Biol. 24, 1411–1425. doi: 10.1128/MCB.24.3.1411-1425.2004


RIPK3-dependent cell death. Cell Host Microbe 20, 674–681. doi: 10.1016/j. chom.2016.09.014


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kim and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Applications of Gene Editing in Chickens: A New Era Is on the Horizon

Hicham Sid and Benjamin Schusser\*

*Department of Animal Sciences, Reproductive Biotechnology, School of Life Sciences Weihenstephan, Technical University Munich, Freising, Germany*

The chicken represents a valuable model for research in the area of immunology, infectious diseases as well as developmental biology. Although it was the first livestock species to have its genome sequenced, there was no reverse genetic technology available to help understanding specific gene functions. Recently, homologous recombination was used to knockout the chicken immunoglobulin genes. Subsequent studies using immunoglobulin knockout birds helped to understand different aspects related to B cell development and antibody production. Furthermore, the latest advances in the field of genome editing including the CRISPR/Cas9 system allowed the introduction of site specific gene modifications in various animal species. Thus, it may provide a powerful tool for the generation of genetically modified chickens carrying resistance for certain pathogens. This was previously demonstrated by targeting the Trp38 region which was shown to be effective in the control of avian leukosis virus in chicken DF-1 cells. Herein we review the current and future prospects of gene editing and how it possibly contributes to the development of resistant chickens against infectious diseases.

#### Edited by:

*Mark S. Fife, Pirbright Institute (BBSRC), United Kingdom*

#### Reviewed by:

*Robert Etches, Crystal Bioscience, United States Jiuzhou Song, University of Maryland, College Park, United States*

#### \*Correspondence: *Benjamin Schusser*

*benjamin.schusser@tum.de*

#### Specialty section:

*This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics*

Received: *29 May 2018* Accepted: *18 September 2018* Published: *09 October 2018*

#### Citation:

*Sid H and Schusser B (2018) Applications of Gene Editing in Chickens: A New Era Is on the Horizon. Front. Genet. 9:456. doi: 10.3389/fgene.2018.00456* INTRODUCTION

The chicken represents an important source of protein worldwide and a valuable model for the study of developmental biology in vertebrates (Yasugi and Nakamura, 2000; Speedy, 2003). Chickens are constantly exposed to a plethora of pathogens threatening animal welfare as well as human health (Perdue and Swayne, 2005; Humphrey, 2006). Viral pathogens such as influenza A viruses can be transmitted to humans leading to death (Gao et al., 2013). Furthermore, bacterial agents such as Campylobacter jejuni and Salmonella enteritidis cause food borne illnesses in humans associated with digestive symptoms (Bryan and Doyle, 1995). More recently, using genetically modified chickens as a model for various research areas like developmental biology, immunology, physiology and neurology is gaining importance in the avian research community (Mozdziak and Petitte, 2004; Stern, 2004, 2005). In addition, there is an increasing interest to generate genetically modified chickens resistant to specific pathogens, benefiting from the availability of gene manipulation techniques. This review focuses on the advances made in gene editing in chickens and the future perspectives including the generation of specific-pathogen-resistant birds.

Keywords: chicken, CRISPR/Cas9, transgenic, knockout, Diseases, Immunoglobulins

## STATE OF THE ART

Genetically modified animals have significantly contributed to our understanding of different aspects related to immunity, infectious diseases, neurology, behavior, and developmental biology (Yeh et al., 2002; Lyall et al., 2011; Lalonde et al., 2012; Pinkert, 2014; Park et al., 2017b). While mice were the first animals to be genetically modified (Costantini and Lacy, 1981; Gordon and Ruddle, 1981), pronuclear DNA microinjections allowed the introduction of foreign DNA leading to genetic modifications in livestock including rabbits, sheep and pigs (Hammer et al., 1985). Although this method was used for a long time, it did not allow the induction of targeted gene modifications and had the disadvantage of generating random integrations (Perleberg et al., 2018). The generation of knockout (KO) animals was achieved for the first time by gene targeting in embryonic stem cells (ES) (Evans and Kaufman, 1981; Thomas and Capecchi, 1987). Though the induction of the KO was successful, it had the disadvantage of low efficiency (Thomas and Capecchi, 1987). Due to the absence of true ES lines from farm animals and no solid evidence of germline transmission (Talbot and Blomberg, 2008; Soto and Ross, 2016), stable transfection of sheep somatic cells with human factor IX and neomycin resistance followed by nuclear transfer was the alternative to express foreign DNA in livestock (Schnieke et al., 1997) and afterwards for gene targeting (McCreath et al., 2000). At this time, the generation of KO livestock animals was possible by combining somatic cell nuclear transfer (SNTC) and homologous recombination (Lai et al., 2002; Nottle et al., 2007). The laborious procedure of these methods and the low efficiency for generating targeted KO was improved by homologous recombination (Houdebine, 2002) along with different nucleases (Carlson et al., 2012). The transcription activator-like effector nucleases (TALENs) are composed of series of repeats fused to non-specific FokI-cleavage domains that induce double- stranded DNA breaks upon dimerization (Gaj et al., 2013). More recently the Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 system made the process of specific DNAtargeting easier by using single guide RNAs (sgRNAs) (Jinek et al., 2012; Ran et al., 2013; Hsu et al., 2014). CRISPR/Cas9 is an adaptive immune system found in bacteria and archaeal species and uses small-non coding RNAs to guide the Cas9 nuclease to target sites resulting in DNA double-break (Jinek et al., 2012).

In comparison to mammals, difficulties were always associated with the generation of genetically modified chickens due to the complex structure of the chicken zygote (Mozdziak and Petitte, 2004) and the different organization of the chick embryo compared to mammals (Stern, 1990). Over the past 30 years, different research groups paved the way for the generation of genetically modified chickens. Efforts were focused on the stable genomic integration of transgenes and obtaining the highest efficiency of germline transmission. While Pettite and colleagues described the transfer of stage X embryo cells that led to germline transmission, it was not possible to genetically modify these cells and to re-introduce them as germline competent cells into the chicken embryo (Petitte et al., 1990). Although ES were shown to provide a valuable tool for the generation of transgenic mice (Kanatsu-Shinohara et al., 2003), no evidence of germline transmission using chicken ES was reported. Transferred chicken ES cells only contributed to somatic tissue but not to the germline.

The first genetically modified chicken was generated by the insertion of retroviral foreign DNA delivered by avian leukosis virus that was successfully integrated to the germline (Salter et al., 1987). The retroviral vector was injected into the yolk sac near to the developing blastoderm. Since then, various viral vectors have been used to generate genetically modified chickens (Hughes et al., 1986; Bosselman et al., 1989; Salter and Crittenden, 1989; Harvey and Ivarie, 2003; Mozdziak et al., 2003). Drawbacks of viral vectors, such as the replication of deficient viral particles and risks of recombination with wild type viruses, were avoided by plasmid-DNA microinjection into the chicken zygote (Love et al., 1994). The microinjection was done in the germinal disk and led to the generation of transgenic chickens expressing neomycin resistance and a reporter gene lacZ (Love et al., 1994). A total of 5.5% of the generated chicks survived to sexual maturity and later on, one rooster gave 3.4% transmission to his offspring (Love et al., 1994). The germline transmission of integrated transgenes was improved with lentiviral vectors (McGrew et al., 2004). McGrew and colleagues showed the possibility of transduction with lentiviral vectors in G0 birds. Founder cockerels were injected with different plasmids carrying different reporter genes including LacZ and eGFP (McGrew et al., 2004). Lentiviral vectors were injected into the subgerminal cavity of newly laid eggs. Ten of the founder males transmitted 4-45% of the foreign DNA to their offspring (McGrew et al., 2004). Lentiviral vectors offered for the first time the possibility to generate genetically modified chickens with a decent germline transmission efficiency. Nevertheless, the size of the transgene was still limited and precise edits were not possible.

Furthermore, the in ovo injection of the avian retroviral vector RCAS (replication-competent avian sarcoma-leukosis virus with a splice acceptor) carrying enhanced fluorescent protein (eGFP) into unincubated (stage X) blastoderms resulted in stable and widespread expression of eGFP in the embroys. Even though the gonads showed eGFP expression PGCs were eGFP negative indicating viral silencing (Smith et al., 2009).

Like in mammals, chicken primordial germ cells (PGCs) are precursors of gametes and a key element for sperm and oocystes development. At the early hours of embryonic development, PGCs are found in the germinal crescent and migrate afterwards (50–55 h) to the gonads (Kim et al., 2010; Kang et al., 2015) in order to produce sperm and oocystes upon sexual maturity (Fujimoto et al., 1976). The migration of PGCs was found to be greatly influenced by the chemokine stromal cell-derived factor-1 (SDF-1/CXCL12) and its receptor C-X-C chemokine receptor type 4 (CXCR4) (Stebler et al., 2004; Lee et al., 2017c).

The ability to culture PGCs was a milestone in the process of generating transgenic chickens. Genetic modification of PGCs and their subsequent reintroduction into the embryonic vasculature resolved many issues and problems observed with previously established methods. Van de Lavoir and colleagues used BRL or STO feeder cells to cultivate PGCs for up to 217 days. PGCs were shown to retain the germline characteristics by analyzing various germline markers including the chicken vasa homolog (CVH) and were cryoconserved using conventional techniques (Van De Lavoir et al., 2006). PGC-culture was optimized afterwards by Whyte and colleagues that developed feeder and serum free culture conditions that took into consideration the signaling pathways necessary for avian germ cell self-renewal (Whyte et al., 2015). The work of van de Lavoir and colleagues revealed that foreign DNA can be inserted in the genome of PGCs and cells were still restricted to the germline (Van De Lavoir et al., 2006; Leighton et al., 2008). Male PGCs were cultured for a duration between 35 and 110 days during which they were transfected with a construct coding for eGFP and subsequently injected into the vasculature of White Leghorn embryos [stage 13–15 Hamburger and Hamilton (H&H)](Van De Lavoir et al., 2006). Interestingly, the long term culture of PGCs did not influence their ability to colonize the gonads after insertion of foreign DNA, which allowed afterwards the generation of several transgenic chicken lines (Van De Lavoir et al., 2006, 2012; Macdonald et al., 2012). Leighton and colleagues gave new insights about increasing the efficiency of foreign DNA insertion in PGCs mediated by phiC31 integrase that catalyzes site-specific recombination between attB and pseudo attP sites in the chicken genome and increases transgene integration (Leighton et al., 2008).

Lu and colleagues indicated that the piggyBac transposon can be efficiently integrated into the genome of chicken embryo during development via electroporation (Lu et al., 2009). The transfection of PGCs with piggyBac transposon greatly enhanced the integration frequency of foreign DNA into the chicken genome and resulted in the generation of genetically modified chickens (Park and Han, 2012; Glover et al., 2013). In contrast, the injection of piggyBac transposon into the subgerminal cavity of a newly laid egg and subsequent electroporation, resulted in chickens expressing the transgene but no germline transmission was detectable (Liu et al., 2013). At the same time, Tyack and colleagues successfully developed a method for the direct transfection of circulating PGCs using Lipofectamine 2000 in combination with Tol2 transposon and transposase plasmids (Tyack et al., 2013). The plasmid contained the pCAGGS promoter driving the expression of eGFP. Tyack and colleagues found that 5/11 roosters expressed the miniTol DNA in their semen and two of them gave about 1.5% germline transmission (Tyack et al., 2013). This method substantially reduced the time needed for the in vitro isolation and gene manipulation of PGCs; however, it did not increase the germline transmission in G0 (Tyack et al., 2013). In addition, it does not allow clonal selection of PGCs and may result in birds with random integrations of the same transgene. Nevertheless, it is an effective method to produce genetically modified chickens as shown by various publications (Tyack et al., 2013; Lambeth et al., 2016a,b).

The possibility to culture and genetically modify chicken PGCs without losing germline competence made it possible to perform precise gene deletions and integrations in the chicken genome. Specific gene locus KO chickens were generated by Schusser and colleagues via gene targeting by homologous recombination in chicken PGCs (Schusser et al., 2013a, 2016). In the case of targeted immunoglobulin heavy chain J segment, a total of 7 from 27 PGC clones (28%) had a correctly targeted event which reflected a high efficiency comparable to mouse ES cells (Schusser et al., 2013a). Similar efficiency was obtained after targeting the immunoglobulin light chain locus in chicken PGCs. After successful targeting of the immunoglobulin heavy or light chain in chicken PGCs, resulting clones were injected into H&H stage 13–15 embryos in order to generate germline chimeras. Germline transmission rates varied between 0.1 and 48% depending on the used PGC clone (Schusser et al., 2013a, 2016). Resulting homozygous immunoglobulin heavy chain J segment knockout birds showed a depletion of peripheral B cells and antibodies and were the first non-mammalian vertebrates harboring a knockout produced by homologs recombination. In order to perform gene knockouts by homologs recombination in PGCs, isogenic DNA is needed since mismatches in the homology regions are not tolerated (Schusser et al., 2016).

Since PGCs are precursors of sperm, researchers suggested that roosters could be used as recipient for exogenous transfer of genetically-modified PGCs which may improve the germline transmission rate (Trefil et al., 2017). Chicken embryos and adult roosters were chemically or physically sterilized to create a surrogate for external PGC donors (Trefil et al., 2006; Nakamura et al., 2008, 2010; Ghadimi et al., 2017). Nakamura and colleagues partially sterilized chicken embryos by injecting Busulfan into the yolk of fertile eggs before incubation; this led to a significant reduction of endogenous PGCs. Authors demonstrated that the sterilized embryos can be used for exogenous transfer of PGCs resulting in high efficiencies of germline transmission (Nakamura et al., 2008, 2010). Early experiments performed by Trefil and colleagues provided an alternative for chemical sterilization and concluded that repeated gamma irradiation leads to sterilization of roosters (Trefil et al., 2006). Performing injection of donor spermatogonial cells led to reestablishment of male function in 50% of the roosters only 5 weeks after injection (Trefil et al., 2006). Spermatogenesis was restored 4 weeks later in the case of PGC-transplantation compared to spermatogonial cells; however, PGCs exhibited higher efficiency in repopulating the seminiferous epithelium (Trefil et al., 2006, 2017). This was very beneficial in the case of transplantation of genetically modified PGCs into mature roosters after complete irradiation (Trefil et al., 2017). Male fertility was reestablished after the transplantation of GFP- or mCherry-expressing PGCs and resulted in almost 100% germline transmission (Trefil et al., 2017). The prominent advantage of this method is the certainty of the germline transmission and the low number of animals used in the experiment; hence reducing time and costs for testing high number of chimeric roosters. Although using gamma irradiation to sterilize roosters was as efficient as in mice, recent findings in pigs suggested that the knockout of NANOS2, like in mice, results in specific germline ablation with preserved testicular development (Park et al., 2017a); therefore it was suggested that NANOS2 KO pigs may serve as a surrogate for transplantation of donor spermatogonial cells (Park et al., 2017a). Even though the importance of NANOS2 in the transformation of ES into germ cells is well determined, little is known about its function in chickens. The most important steps made in the process of generating genetically modified chickens are summarized in **Table 1**.


#### GENE EDITING IN AVIAN CELL LINES

The unavailability of fully transgenic chickens for a long time encouraged the development of alternative methods based on in vitro cell culture systems. In vitro studies helped to provide valuable data regarding host susceptibility to specific pathogens and the role of specific genes during host-pathogen interactions. DT-40 cells, an avian leukosis virus induced bursal B- cell lymphoma line, was extensively used to investigate B cell immunology, cell cycle regulation, gene conversion and apoptosis (Uckun et al., 1996; Arakawa et al., 2001; Harris et al., 2002; Arakawa and Buerstedde, 2004). A large number of DT-40 mutants were generated to understand B cell biology and were reviewed elsewhere (Arakawa and Buerstedde, 2004). For instance, studies based on DT-40 cells proved that the activationinduced cytidine deaminase (AID) triggers immunoglobulin gene diversification by gene conversion (Buerstedde et al., 1990; Kim et al., 1990). Furthermore, Szüts and colleagues used mutant DT-40 cells to demonstrate the role of RAD18 in DNA repair and the completion of gene conversion (Szüts et al., 2006). Interestingly, Schusser and colleagues replaced the immunoglobulin light and heavy chain loci in DT-40 cells with human immunoglobulin light and heavy chain loci; this led to the expression of chimeric IgM with human variable regions and chicken constant regions (Schusser et al., 2013b). The later cell line provides a model to study the diversification of the human variable region by gene conversion and somatic hypermutations in chickens. Antigen receptor analysis were performed by deep sequencing confirming that the host machinery in DT-40 cells diversified the integrated human V genes (Leighton et al., 2015).

A different established model for examining gene function in chickens is the Douglas Foster (DF-1) cells, an immortalized chicken fibroblast cell line (Foster, 1998). Recent studies used DF-1 cells to investigate host-pathogen interactions of several avian pathogens with the avian host; this included influenza A viruses, Newcastle disease virus, infectious bursal disease virus and retroviruses (Huang et al., 2003; Lee et al., 2008; Cheng et al., 2015; Hui and Leung, 2015). The overexpression of different avian genes in DF-1 cells helped to examine their role in the innate immunity against viral pathogens (Shao et al., 2014; Cheng et al., 2015; Xu et al., 2015). A well-known tool for the overexpression of various genes is the retroviral vectors derived from the SR-A strain of Rous sarcoma virus (RCAS). The RCAS system is known for its stable transduction in developing chicken embryo and cell culture (Fekete and Cepko, 1993; Bell and Brickell, 1997). Reuter and colleagues used DF-1 cells for the overexpression of the chicken IFN-α and IFN-λ (Reuter et al., 2014). The overexpression of IFN-λ in DF-1 cells did not cause substantial viral resistance against influenza A viruses H1N1, H7N1, and vesicular stomatitis virus (VSV) (Reuter et al., 2014) which suggested that DF-1 cells have weak antiviral activity of IFN-λ (Karpala et al., 2008). This was not the case for IFN-α where the overexpression led to protection against previously mentioned viruses (Reuter et al., 2014). In addition, DF-1 cells were useful to study the function of foreign genes in chicken including intracellular pattern recognition receptor such as the retinoic inducible resistant gene (RIG-I). RIG-I from duck and goose was overexpressed in DF-1 cells and its protective effect against influenza A viruses and infectious bursitis virus (IBDV) was investigated (Barber et al., 2010; Sun et al., 2013; Shao et al., 2014). The overexpression of duck RIG-I in DF-1 cells reduced viral replication and upregulated virus-induced apoptosis following IBVD- and H9N2 influenza virus infections (Shao et al., 2014). Interestingly, the knockdown of the chicken ANP32A, a nuclear protein implicated in mRNA transport and cell death (Reilly et al., 2014), reduced the activity of different avian influenza polymerases in DF-1 cells. This indicated that avian influenza virus polymerases are more adapted to avian ANP32A and proposed this gene as target for antiviral drugs (Long et al., 2016). Furthermore, the overexpression of the chicken GADD45β, a protein associated with cell growth control, apoptotic cell death, and the cellular response to DNA damage (Zazzeroni et al., 2003), helped to limit viral infection which could be used in the future as potential treatment for avian leukosis virus (ALV)-J infections (Zhang et al., 2016).

ALV is one of the most commonly occurring retroviruses in chickens. It induces a variety of neoplastic lesions causing losses in the productivity of affected chicken flocks (Fadly, 2000). Maas and colleagues confirmed that DF-1 cells are much more suitable than primary chicken fibroblasts (CEFs) to study host-pathogen interactions of leukosis viruses with avian cells (Maas et al., 2006). ALV was detected earlier in DF-1 cells and the infection was associated with apparent cytopathogenic effect (CPE) compared to infected-CEFs that had no apparent CPE (Maas et al., 2006). Mutations responsible for the inhibition of ALV subgroup A cell-entry were identified (Klucking et al., 2002) and consisted of four base pairs insertion and one base pair substitution in tumor virus locus A (tva) (Klucking et al., 2002). On the other side, only one base pair substitution in the cysteine-rich domain (CRD) of tvb receptor led to reduced susceptibility of DF-1 cells to infection with ALV subgroup B (Klucking et al., 2002; Reinisová et al., 2008). Interestingly, subgroup J ALV (ALV-J) uses the multimembrane-spanning cell surface protein, the chicken Na+/H+ exchanger type 1 (NHE1), as a receptor. The attachment of the virus to the receptor is crucial to initiate the infection (Barnard et al., 2006). Kucerová and colleagues used ˇ mutagenesis to introduce changes in the subgenic fragment of NHE1 (Kucerová et al., 2013 ˇ ); authors described the functional importance of tryptophan reside at position 38 (Trp38) for virus entry (Kucerová et al., 2013 ˇ ).

The rapid development of gene editing tools such as CRISPR/Cas9 rendered cell culture systems much more useful by easily targeting different genes. Precise gene editing of the chicken NHE1 gene using CRISPR/Cas9 system led to resistance of DF-1 cells against ALV-J infection (Lee et al., 2017a). The precise genome editing of NHE1 was performed via homologs directed repair (HDR) that combined CRISPR/Cas9 vectors with single-stranded oligodeoxynucleotide (ssODNs). Authors confirmed previous observations mentioning that mutation in the Trp38 are detrimental for ALV-J infection (Kucerová ˇ et al., 2013). On the other side, non-homologous end joining repair (NHEJ) was also established in DF-1 cells. Targeting the tumor virus locus B gene, which serves as entry receptor for ALV subgroup B, resulted in frameshift mutations leading to a KO of the tvb-receptor in DF-1 cells, which conferred resistance against ALV-B (Lee et al., 2017b). Abu-Bonsrah and colleagues targeted a wide range of genes in DF-1 cells such as DROSHA, DICER, MBD3, KIAA1279, CDKN1B, EZH2, HIRA, TYRP1, STMN2, RET, and DGCR, that play a role in embryonic development and pathogenesis of embryonic diseases (Abu-Bonsrah et al., 2016). Efficiency of inducing mutations was analyzed by T7E1 assay. The efficiency ranged between 20 and 65% in DF-1 cells (Abu-Bonsrah et al., 2016). Similar results were obtained after knocking out KIAA1279- and CDKN1Bgenes in DT-40 cell-line via electroporation (Abu-Bonsrah et al., 2016). Likewise, Bai and colleagues gave more insights about the efficiency of CRISPR/Cas9 in DF-1 cells by studying gene editing in the presence and the absence of puromycin antibiotic selection (Bai et al., 2016). Three genes including peroxisome proliferator-activated receptor-γ (PPAR-γ ), ATP synthase epsilon subunit (ATP5E), and ovalbumin (OVA) were targeted with CRISRP/Cas9 vectors. T7E assay indicated that puromycin selection increased mutation rate in the previously mentioned genes from 0.75, 0.5, and 3.0%, to 60.7, 61.3, and 47.3%, respectively (Bai et al., 2016).

### GENE EDITING IN THE CHICKEN EMBRYO

The chicken embryo is a well-established model to study developmental processes, gene functions and host-pathogen interactions (Darnell and Schoenwolf, 2000; Chesnutt and Niswander, 2004; Schecterson et al., 2012). Over the last decades, different methods were established to genetically manipulate chicken embryos including electroporation of foreign DNA constructs, transduction with retroviruses and recently the combination of previous known methods with CRISPR/Cas9 system (Gandhi et al., 2017).

For example, Luo and colleagues established a protocol based on ex ovo electroporation of 3.5 days old chicken embryos for the overexpression of Cad7 and eGFP (Luo et al., 2012). This method provided accessibility of different embryonic parts for the electroporation, which are not easily reachable when the embryo is still inside the egg (Luo et al., 2012). Similarly, in ovo electroporation of the embryonic auditory brainstem was previously established (Lu et al., 2017). Plasmids of interest were successfully integrated into the nucleus magnocellularis and nucleus laminaris. Authors indicated the possibility of drug inducible gene expression which was confirmed in the presence of doxycycline (Lu et al., 2017).

A well-established tool for foreign DNA integration is the RCAS-system. Using RCAS in the chicken embryo model indicated that vector proteins and inserted transgenes were mainly detectable in the skin, blood vessels and heart (Sato et al., 2002; Kothlow et al., 2010). Several studies deduced the efficacy of RCAS-system in the case of foreign DNA integration and gene overexpression in chicken embryos (Bell and Brickell, 1997; Sato et al., 2002; Kothlow et al., 2010; Schusser et al., 2011; Reuter et al., 2014). This system is very useful to study the specific function of relevant genes for the innate immunity, particularly during the interaction with influenza A viruses. RCAS vectors expressing various Mx gene isoforms were used for transduction of CEFs. Four days post-transfection, CEFs expressing the retrovirally transduced Mx proteins were injected in the yolk sac of 3 daysold fertilized eggs (Schusser et al., 2011). The overexpression of different Mx isoforms in embryonated eggs did not protect against influenza A virus infection, which was in agreement with the results obtained from chicken fibroblasts (Schusser et al., 2011). In addition, the role of IFN-λ was previously investigated by the generation of mosaic chicken embryos overexpressing chicken IFN-λ (Reuter et al., 2014). Generated embryos exhibited lower viral titers upon challenge with influenza A viruses, NDV Herts-33, or IBV M-41 via the allantoic cavity by at least four log<sup>10</sup> units compared to inoculated eggs with empty RCAS vector (Reuter et al., 2014). This clearly demonstrated the protective effect of chicken IFN-λ against different viruses (Reuter et al., 2014). Although the IFN-λ overexpression had detrimental effects at early hours post hatch (Reuter et al., 2014), RCAS system was shown to be successful for maintaining transgene expression after hatch (Kothlow et al., 2010).

A similar system based on gene transfer mediated by lentiviral vectors was described in embryonated eggs (Hen et al., 2012). Usefulness of lentiviral vectors in developmental biology was previously reviewed elsewhere (Stern, 2004). Lentiviral vectors of feline immunodeficiency virus origin were injected into chorioallantoic membrane (CAM) of 11 days old chicken embryos. The injected lentiviral vectors carried yellow fluorescent protein (YFP) or recombinant alpha-melanocytestimulating hormone (α-MSH) genes and they were expressed under the cytomegalovirus (CMV) promoter (Hen et al., 2012). High efficiency of transduction was observed in the liver, which implied that this model could be useful for the study of hormones and enzymes.

The application of gene editing technologies via in ovo electroporation of chicken embryos seems to be efficient (Wilson and Stoeckli, 2012). Wilson and Stoeckli used miRNA-based plasmids for knocking down gene expression in the chicken neural tube (Wilson and Stoeckli, 2012). Additionally, Ghandi and colleagues used ex ovo electroporation to knockout Pax7 and Sox10, a key transcription factors in the neural crest, leading to loss of their proteins and transcripts (Gandhi et al., 2017). Overall, collected data indicated that in ovo gene manipulation of the chicken embryo could be used as a model for the study of different embryonic developmental stages (Gandhi et al., 2017; Lu et al., 2017). High targeting efficiency and the simplicity of CRISPR/Cas9 make it now possible to knockout genes in specific tissues/organs of the developing chicken embryo. This allows the study the gene function during development without generating fully gene edited chicken lines.

### GENERATION OF GENETICALLY MODIFIED CHICKENS

The generation of genetically modified chickens has wide applications in agricultural and biomedical research (Sang, 1994; Ivarie, 2003; Mozdziak and Petitte, 2004). Benefiting from gene editing technologies and germline transmission of PGCs, new knowledge was brought to light about specific gene functions (Schusser et al., 2013a, 2016), resistant for infectious diseases (Lyall et al., 2011) and the possible preservation of endangered species including the Houbara Bastard (Kang et al., 2008; Wernery et al., 2010; Van De Lavoir et al., 2012). Different methods used for gene editing in chickens and the generated chicken lines were stated earlier in this review. In addition, the worldwide availability of genetically modified chicken lines is summarized in **Table 2.**

Specific gene editing in PGCs was improved using TALEN and CRISPR/Cas9 via HDR (Dimitrov et al., 2016; Oishi et al., 2016; Taylor et al., 2017). Using CRISPR/Cas9, the efficiency of gene targeting was increased remarkably in PGCs (Dimitrov et al., 2016). In order to introduce a loxP site into the immunoglobulin heavy chain locus, Dimitrov and colleagues combined a targeting vector having a total of 2 kb homology arms with CRISPR/Cas9 system targeting the upstream region of the single immunoglobulin heavy chain variable region (VH) in PGCs (Dimitrov et al., 2016). Interestingly, all selected drug resistant PGC clones contained the correct targeting event and the germline transmission rate varied between 0 and 100% depending on the used PGC line (Dimitrov et al., 2016).

Targeting the DDX4 locus, located on the Z chromosome, showed possible role of this gene in the formation of the germ cell lineage (Taylor et al., 2017). Targeted DDX4 KO was achieved with TALEN in combination with a targeting vector. Authors reported a germline transmission rate of 6% from the founder birds (Taylor et al., 2017). G1 female chicks were hemizygous mutant for DDX4, they did not lay eggs and had no yellow or white follicles in the ovaries. Surprisingly this was not the case in DDX4 knockout female mice (Tanaka et al., 2000).

Overall, a significant progress was made in the last decade in producing and using genetically modified chickens to understand developmental biology, immunology, host-pathogen interaction, reproductive biology and physiology. However, efforts to generate resistant chickens for specific pathogens are still at the beginning, probably due to the lack of specific gene targets responsible for acquiring resistance against specific pathogens. This was not the case in other livestock including pigs which were genetically edited to gain resistance against porcine reproductive and respiratory syndrome virus (PRRSV) (Whitworth et al., 2015; Burkard et al., 2017). Using NHEJ, Whitworth and colleagues generated KO pigs with premature stop codon in exon 3 of the viral receptor CD163 (Whitworth et al., 2015). CD163-KO pigs challenged with PRRSV did not exhibit any clinical symptoms, lung pathology, viremia, or antibody response. In addition, Burckard and colleagues generated an exon 7 deletion in CD163 using two sgRNAs to induce the excision of the exon (Burkard et al., 2017). Pigs carrying the mutation were healthy and kept the main biological functions of the protein while macrophages isolated from the CD163 KO animals indicated an inhibition of the viral infection (Burkard et al., 2017).

So far, only few reports are available about the resistance of gene-edited chickens for specific pathogens. Lyall and colleagues generated transgenic chickens expressing short-hairpin RNA intended to function as a decoy that interacts and blocks influenza A virus polymerase (Lyall et al., 2011). Although birds

#### TABLE 2 | Worldwide availability of genetically modified chickens.


were not resistant to initial infection, viral transmission was prevented (Lyall et al., 2011). A different study demonstrated the possibility to suppress influenza A virus transmission in transgenic birds expressing the 3D8 single chain variable fragment (scFv), a gene that interacts with viral genome leading to suppression of viral shedding (June Byun et al., 2017).

## FURTHER APPLICATIONS IN BIOMEDICAL RESEARCH

The chicken became a very interesting model in biomedical research. Different temporal patterns of bright light were used to study the effect on myopia in chickens. Lan and colleagues found that intermittent episodes of light suppress myopia in chickens more than continuous bright light (Lan et al., 2014). Although the obtained results may not be directly translated into humans (Lan et al., 2014), future applications in optical research seem to be promising. In addition, the chicken was used as a model for xenotransplantation by injection of human stem cells into small induced lesions in the chicken embryo neural tube (Boulland et al., 2010). Authors stated that the reduced immune response during early embryonic development helps to study xenotransplantation without the risk of early immune rejection (Boulland et al., 2010). The chicken was also used as a human multiple myeloma xenograft model (Martowicz et al., 2015); it was suggested that this model may offer novel therapeutic compounds targeting survival and proliferation of multiple myeloma cells. Using the chicken as a bioreactor may greatly benefit human health by providing alternative therapeutic approaches (Zhu et al., 2005). A promising approach using chickens for the production of human antibodies is the replacement of the chicken immunoglobulin variable regions by human V regions and synthetic pseudogene arrays in order to produce affinity matured human antibodies in chickens (Ching et al., 2018).The OmniChicken by Ligand Pharmaceuticals Inc. is a worldwide unique platform to produce human monoclonal antibodies from chickens making use of the phylogenetic difference between mammals and birds. The purification of overexpressed human antibodies from the chicken egg seems also to be a valid application which was reviewed elsewhere (Flemming, 2005). A very recent study conducted by Oishi and colleagues demonstrated the ability of integrating human interferon beta (hIFN-β) into the chicken ovalbumin locus in order to produce hIFN-β in egg white (Oishi et al., 2018). Authors demonstrated the ability of producing foreign proteins in eggs which would have industrial and therapeutic applications.

#### FUTURE PERSPECTIVES

The role of host genes in the susceptibility of chickens to different pathogens was mostly investigated in vitro. Preliminary in vitro investigations provide solid information about the role of these genes prior to the generation of fully gene edited chickens. New

#### REFERENCES


technologies including CRISPR/Cas9 make the process of gene editing easy and highly efficient in contrast to the well-established process of homologs recombination. Although gene editing in mammals, particularly mice and pigs, is vastly advanced, gene editing in chickens is entering the golden age. For instance the generation of Cas9-expressing pigs will provide a powerful tool for the study of biological processes (Wang et al., 2017); while this was not done yet in chickens, it seems to be beneficial and may be used in the future to dissect unknown gene functions faster and more easily.

Therapeutic applications using human monoclonal antibodies produced from humanized chickens may be beneficial over in vitro approaches lacking affinity maturation (Ching et al., 2018). In addition, production of antibodies in chicken eggs represents an economic and stress-free method for the production of specific antibodies (Amro et al., 2018). Using chicken eggs to manufacture specific proteins in eggs seems interesting (Lillico et al., 2005; Petitte and Mozdziak, 2007) especially since it may allow improvement of digestibility of sugar complexes in feedstuffs; however, this application may be thwarted by critics that claim the inedibility of the product.

Several advantages are provided by newly invented gene editing technologies including the simplicity of design and application combined with high efficiency (Chira et al., 2017). Understanding the host cell behavior during host-pathogen interactions may help targeting pathogen specific receptors and viral cellular transport (Heaton et al., 2016). The determination of new target genes associated with disease susceptibility should fill the research gap and open the door for new therapeutical approaches. Although the debate about using genetically modified animals in food production will continue to be stimulated, we may obtain new breeds of chickens in the future that are resistant for specific pathogens. We speculate that spending more efforts connecting gene editing technologies with the prevention of infectious diseases will change the way we use to fight pathogens and will probably improve the animal welfare.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

## FUNDING

This research was supported by grant Schu2446/3-1 to BS from the Deutsche Forschungsgemeinschaft.


autonomous sex development in the chicken. Endocrinology 157, 1258–1275. doi: 10.1210/en.2015-1571


primordial germ cells using piggyBac and Tol2 transposons. Proc. Natl. Acad. Sci.U.S.A. 109, E1466–E1472. doi: 10.1073/pnas.1118715109


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Sid and Schusser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models

Mehdi Momen<sup>1</sup> , Ahmad Ayatollahi Mehrgardi <sup>1</sup> \*, Mahmoud Amiri Roudbar <sup>1</sup> , Andreas Kranis <sup>2</sup> , Renan Mercuri Pinto3,4, Bruno D. Valente<sup>4</sup> , Gota Morota<sup>5</sup> , Guilherme J. M. Rosa4,6 and Daniel Gianola4,6,7

<sup>1</sup> Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran, <sup>2</sup> Roslin Institute, University of Edinburgh, Midlothian, United Kingdom, <sup>3</sup> Department of Exact Sciences, University of São Paulo-Escola Superior de Agricultura Luiz de Queiroz, Piracicaba, Brazil, <sup>4</sup> Department of Animal Sciences, University of Wisconsin, Madison, WI, United States, <sup>5</sup> Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States, <sup>6</sup> Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, United States, <sup>7</sup> Department of Dairy Science, University of Wisconsin, Madison, WI, United States

#### Edited by:

John Anthony Hammond, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Fabyano Fonseca Silva, Universidade Federal de Viçosa, Brazil Gregor Gorjanc, University of Edinburgh, United Kingdom

> \*Correspondence: Ahmad Ayatollahi Mehrgardi mehrgardi@uk.ac.ir

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 21 June 2018 Accepted: 18 September 2018 Published: 09 October 2018

#### Citation:

Momen M, Ayatollahi Mehrgardi A, Amiri Roudbar M, Kranis A, Mercuri Pinto R, Valente BD, Morota G, Rosa GJM and Gianola D (2018) Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models. Front. Genet. 9:455. doi: 10.3389/fgene.2018.00455 Network based statistical models accounting for putative causal relationships among multiple phenotypes can be used to infer single-nucleotide polymorphism (SNP) effect which transmitting through a given causal path in genome-wide association studies (GWAS). In GWAS with multiple phenotypes, reconstructing underlying causal structures among traits and SNPs using a single statistical framework is essential for understanding the entirety of genotype-phenotype maps. A structural equation model (SEM) can be used for such purposes. We applied SEM to GWAS (SEM-GWAS) in chickens, taking into account putative causal relationships among breast meat (BM), body weight (BW), hen-house production (HHP), and SNPs. We assessed the performance of SEM-GWAS by comparing the model results with those obtained from traditional multi-trait association analyses (MTM-GWAS). Three different putative causal path diagrams were inferred from highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 using the inductive causation algorithm. A positive path coefficient was estimated for BM→ BW, and negative values were obtained for BM→ HHP and BW→ HHP in all implemented scenarios. Further, the application of SEM-GWAS enabled the decomposition of SNP effects into direct, indirect, and total effects, identifying whether a SNP effect is acting directly or indirectly on a given trait. In contrast, MTM-GWAS only captured overall genetic effects on traits, which is equivalent to combining the direct and indirect SNP effects from SEM-GWAS. Although MTM-GWAS and SEM-GWAS use the similar probabilistic models, we provide evidence that SEM-GWAS captures complex relationships in terms of causal meaning and mediation and delivers a more comprehensive understanding of SNP effects compared to MTM-GWAS. Our results showed that SEM-GWAS provides important insight regarding the mechanism by which identified SNPs control traits by partitioning them into direct, indirect, and total SNP effects.

Keywords: causal structure, GWAS, multiple traits, path analysis, SEM, SNP effect

## INTRODUCTION

Genome-wide association studies (GWAS) have become a standard approach for investigating relationships between common genetic variants in the genome (e.g., single-nucleotide polymorphisms, SNPs) and phenotypes of interest in human, plant, and animal genetics (Hayes and Goddard, 2010; Brachi et al., 2011; Wang et al., 2012). A typical GWAS is based on univariate linear or logistic regression of phenotypes on genotypes for each SNP individually while often adjusting for the presence of nuisance covariates (Hayes and Goddard, 2010; Sikorska et al., 2013). A statistically significant association indicates that SNPs may be in strong linkage disequilibrium (LD) with quantitative trait loci (QTL) that contribute to the trait etiology. Alternatively, multi-trait model GWAS (MTM-GWAS) can be used to test for genetic associations among a set of traits (Korte et al., 2012; O'Reilly et al., 2012; Zhou and Stephens, 2012). It has been established that MTM-GWAS reduces false positives and increases the statistical power of association tests, explaining the recent popularity of this method. MTM-GWAS can be used to study genetic associations among a set of traits. However, it does not consider various cryptic biological signals that may affect a trait of interest, either directly or indirectly through other intermediate traits.

Complex traits are the product of various cryptic biological signals that may affect a trait of interest either directly or indirectly through other intermediate traits (Falconer and Mackay, 1996). A standard regression cannot describe such complex relationships between traits and QTLs properly. For instance, some traits may simultaneously act as both dependent and independent variables. Structural equation modeling (SEM) is an extended version of Wright's path analysis (Wright, 1921; Gianola and Sorensen, 2004) that offers a powerful technique for modeling causal networks. In a complex genotype-phenotype setting involving many traits, a given trait can be influenced not only by genetic and systematic factors but also by other traits (as covariates). Here, QTLs may not affect the target trait directly; instead, the effects may be mediated by upstream traits in a causal network. Indirect effects may therefore constitute a proportion of perceived pleiotropy, and these concepts apply to sets of heritable traits, organized as networks, that are common in biological systems. An example from dairy cattle production systems, described by Gianola and Sorensen (2004), is that higher milk yield increases the risk of a particular disease, such as mastitis, while the prevalence of the disease may negatively affect milk yield As another example, Varona et al. (2007) explored a causal link from litter size to average piglet weight in two pig breeds. In humans, obesity is a key factor influencing insulin resistance, which subsequently causes type 2 diabetes. Lists of causal networks across human diseases and candidate genes are described in Kumar and Agrawal (2013) and Schadt (2016).

Although MTM-GWAS is a valuable approach, it only captures correlations or associations among traits and does not provide information about causal relationships. Knowledge of the causal structures underlying complex traits is essential, as correlation does not imply causation. For example, a correlation between two traits, T1 and T2, could be attributed to a direct effect of T1 on T2 or T2 on T1, or to additional variables that jointly influence both traits (Rosa et al., 2011). Likewise, if we know a "causal" SNP is linked to a QTL, we can imagine three possible scenarios with respect to T11: (1) causal (SNP → T1 → T2), (2) reactive (SNP → T2 → T1), or (3) independent (T1 ← SNP → T2). Scenarios (1) and (2) do not cause pleiotropy but produce association.

A SEM methodology has the ability to handle complex genotype-phenotype maps in GWAS, placing an emphasis on causal networks (Li et al., 2006). Therefore, SEM-based GWAS (SEM-GWAS) may provide a better understanding of biological mechanisms and of relationships among a set of traits than MTM-GWAS. SEM can potentially decompose the total SNP effect on a trait into direct and indirect (i.e., mediated) contributions. However, SEM-derived GWAS has yet not been discussed or applied fully in quantitative genetic studies yet. Our objective was to illustrate the potential utility of SEM-GWAS by using three production traits in broiler chickens genotyped for a battery of SNPs as a case example.

#### MATERIALS AND METHODS

#### Data Set

The analysis included records for 1,351 broiler chickens provided by Aviagen Ltd. (Newbridge, Scotland) for three phenotypic traits: ultrasound of breast muscle (BM) at 35 days of age, body weight (BW), and hen-house egg production (HHP), defined as the total number of eggs laid between weeks 28 and 54 per bird. The sample consisted of 274 full-sib families, 326 sires, and 592 dams. More details regarding population and family structure were provided by Momen et al. (2017). A pre-correction procedure was performed on the phenotypes to account for systematic effects such as sex, hatch week, pen, and contemporary group for BM and BW. HHP was corrected for random hatch effects, with a general mean as the sole fixed effect.

Each bird was genotyped for 580,954 SNP markers with a 600k Affymetrix SNP (Kranis et al., 2013) chip (Affymetrix, Inc., Santa Clara, CA, USA). The Beagle software program (Browning and Browning, 2007) was used to impute missing SNP genotypes, and quality control was performed using PLINK version 1.9 (Purcell et al., 2007). Markers with minor allele frequencies (MAF) < 1%, call rate < 95%, and Hardy–Weinberg equilibrium (Chi-square test p-value threshold was 10−<sup>6</sup> ) were removed. The main reason for conducting the HWE test was to remove SNPs with potential genotyping error. Finally, 354,364 autosomal SNP markers were included in the analysis.

#### Multiple-Trait Model for GWAS

MTM-GWAS is a single-trait GWAS model extended to multidimensional responses. When only considering additive effects of SNPs, the phenotype of a quantitative trait using the single-trait model can be described as:

$$
\gamma\_i = \omega\_{i\circ} s\_{\circ} + e\_i \tag{1}
$$

where y<sup>i</sup> is the phenotypic trait of individual i, w<sup>j</sup> = (w1, . . . ,wp) is the number of A alleles (i.e., w<sup>j</sup> ∈ {0, 1, 2}) in the genotype of SNP marker j, and s<sup>j</sup> is the allele substitution effect for SNP marker j. Strong LD between markers and QTLs coupled with an adequate marker density increases the chance of detecting marker and phenotype associations. Hypothesis testing is typically used to evaluate the strength of the evidence of a putative association. Typically, a t-test is applied to obtain p-values, and the statistic is Tij = sˆj se(sˆj) , where sˆ is the point estimate of the j-th SNP effect and se(sˆj) is its standard error.

The single locus model described above is naive for a complex trait because the data typically contain hidden population structure and individuals have varying degrees of genetic similarity (Listgarten et al., 2012; Gianola et al., 2016). Therefore, accounting for covariance structure induced by genetic similarity is expected to produce better inferences (Kennedy et al., 1992). Ignoring effects that reveal genetic relatedness inflates the residual terms and compromises the ability to detect association. A random effect g<sup>i</sup> , including a covariance matrix reflecting pairwise similarities between additive genetic effects of individuals, can be included to control population stratification. The similarity metrics can be derived from pedigree information or from whole-genome marker genotypes. This model, extended for analysis of t traits, is given by:

$$\mathbf{Y} = \mathbf{W}\mathbf{s} + \mathbf{g} + \mathbf{s} \tag{2}$$

where **Y** is the pre-adjusted phenotypic value measured on each birds, **W** as previously defined, represent the incidence matrix of genotype codes, **s** is the vector of additive marker effect, **g** is the vector of random polygenic effect, **g**∼**N**(**0**, P **<sup>g</sup>** ⊗**K**), and ε represents the residual vector, ε∼**N**(**0**, P <sup>ε</sup> ⊗**I**). Here ⊗ denotes the Kronecker product. The covariance matrices were:

$$
\begin{split}
\sum\_{\mathfrak{g}} &= \begin{bmatrix}
\sigma\_{\mathcal{S}(\mathrm{RM})}^2 & \sigma\_{\mathcal{S}(\mathrm{RM},\mathrm{BW})} & \sigma\_{\mathcal{S}(\mathrm{RM},\mathrm{HHP})} \\
\sigma\_{\mathcal{S}(\mathrm{BW})}^2 & \sigma\_{\mathcal{S}(\mathrm{BW},\mathrm{HHP})} \\
\mathrm{Symmetric} & \sigma\_{\mathcal{S}(\mathrm{HHP})}^2
\end{bmatrix} \text{and} \\
\sum\_{\varepsilon} &= \begin{bmatrix}
\sigma\_{\mathcal{E}\_{\mathrm{(RM)}}}^2 & \sigma\_{\mathcal{E}\_{\mathrm{(RM},\mathrm{BW})}} & \sigma\_{\mathcal{E}\_{\mathrm{(RM},\mathrm{HHP})}} \\
\sigma\_{\mathcal{E}\_{\mathrm{(BW})}}^2 & \sigma\_{\mathcal{E}\_{\mathrm{(BW},\mathrm{HHP})}} \\
\mathrm{Symmetric} & \sigma\_{\mathcal{E}\_{\mathrm{(HHP})}}^2
\end{bmatrix}.
\end{split}
$$

The positive definite matrix **K** may be a genomic relationship matrix (**G**) computed from marker data, or a pedigree-based matrix (**A**) computed from genealogical information. The **A** matrix describes the expected additive similarity among individuals, while **G** measures the realized fraction of alleles shared. Genomic relationship matrices can be derived in several ways (VanRaden, 2008; Yang et al., 2010; Forni et al., 2011). Here, we used the form proposed by VanRaden (2008):

$$\mathbf{G} = \frac{\mathbf{M}\mathbf{M}^{'}}{2\sum p\_{j}q\_{j}}\tag{3}$$

where **M** is an n × p matrix of centered SNP genotypes and p<sup>j</sup> and q<sup>j</sup> = 1 − p<sup>j</sup> are the allele frequencies at marker locus j. We evaluated both **A** and **G** in the present study.

#### Structural Equation Model Association Analysis

A SEM consists of two essential parts: a measurement model and a structural model. The measurement model depicts the connections between observable variables and their corresponding latent variables (Anderson and Gerbing, 1988). The measurement model is also known as confirmatory factor analysis. The critical part of a SEM is the structural model, which can have three forms (Raykov and Marcoulides, 2012). The first consists of observable exogenous and endogenous variables. This model is a restricted version of a SEM known as path analysis (Wright, 1921). The second form explains the relationship between exogenous and endogenous variables that are only latent. The third type is a model consisting of both manifest and latent variables.

SEM can be applied to GWAS as an alternative to MTM-GWAS to study how different causal paths mediate SNP effects on each trait. The following SEM model was considered:

$$Y = \Lambda Y + \mathcal{W}s + \mathfrak{g} + \varepsilon \tag{4}$$

where 3 is a t × t matrix of regression coefficients or structural coefficients (typically lower-triangular) according to the learned causal structure from the residuals and the diagonal matrix filled with zeros:

$$
\Lambda = \begin{bmatrix}
\mathbf{0} & \mathbf{0} & \mathbf{0} \\
\lambda\_{(BM \to BW)} & \mathbf{0} & \mathbf{0} \\
\lambda\_{(BM \to HHP)} & \lambda\_{(BW \to HHP)} & \mathbf{0}
\end{bmatrix}
$$

The vectors **g** and ε are assumed to have a joint distribution **g** ε = **N 0 0** , P **<sup>g</sup>** ⊗**K 0 0** 9 , and the residual covariance matrix is a diagonal as 9 = σ 2 ε(BM) **0 0 0** σ 2 ε(BW) **0 0 0** σ 2 ε(HHP) . The

remaining terms are as presented earlier with one important difference: the SNP effects are not interpreted as overall effects on trait t but instead represent direct effects on trait t. Additional indirect effects from the same SNP may be mediated by phenotypic traits in C. Each marker is entered into Equation (4) separately, and its significance is tested. For a discussion of how SEM represents genetic signals on each trait through multiple causal paths, see Wu et al. (2010) and Jamrozik and Schaeffer (2011). Despite the difference in interpretation, the distribution of the vector of polygenic effects is assumed to be the same as in the MTM-GWAS model. The same applies to residual terms within a trait. We also consider trait-specific residuals to be independent within an individual. This restriction is required to render structural coefficients likelihood-identifiable. In addition, the interpretation of inferences as having a causal meaning requires imposing the restriction that the residuals' joint distribution be interpreted as the causal sufficiency assumption (Pearl, 2009). In the present study, all exogenous and endogenous variables were observable, and there was no latent variable. Hence, causal structure was assumed between the endogenous variables BM, BW, and HHP.

We considered the following GWAS models with their causal structures were recovered by the inductive causation (IC) algorithm (Pearl, 2009): (1) MTM-GWAS with pedigree-based kinship **A** (MTM-A) or marker-based kinship **G** (MTM-G), and (2) SEM-GWAS with **A** (SEM-A) or **G** (SEM-G). Although nuisance covariates such as environmental factors can be omitted in the graph, they may be incorporated into the models as exogenous variables. The SEM representation allowed us to decompose SNP effects into direct, indirect, and total effects.

A direct SNP effect is the path coefficient between a SNP as an exogenous variable and a dependent variable without any causal mediation by any other variable. The indirect effects of a SNP are those mediated by at least one other intervening endogenous variable. Indirect effects are calculated by multiplying path coefficients for each path linking the SNP to an associated variable, and then summing over all such paths (Mi et al., 2010a; Jiang et al., 2013). The overall effect is the sum of all direct and indirect effects. By explicitly accounting for complex relationship structure among traits in such a way, SEM provides a better understanding of a genomewide SNP analysis by allowing us to decompose effects into direct, indirect, and overall effects within a predefined casual framework (Nock and Zhang, 2011). MTM-GWAS and SEM-GWAS were compared with the logarithm of the likelihood function (log L), Akaike's Information Criterion (AIC), and the Bayesian Information Criterion (BIC). The model providing the lowest values for these information criteria is considered to fit the data better. MTM-GWAS and SEM-GWAS were fitted using the SNP Snappy strategy (Meyer and Tier, 2012), which is implemented in the Wombat software program (Meyer, 2007). The outputs were a vector of multiple SNP effect estimates, sˆ = - sˆBM, sˆBW,sˆHHP , with corresponding standard errors and respective t-values.

#### Searching for a Phenotypic Causal Network in a Mixed Model

In the SEM-GWAS formulation described earlier, the structure of the underlying causal phenotypic network needs to be known. Because this is not so in practice, we used a causal inference algorithm to infer the structure. Residuals are assumed to be independent in all SEM analyses, so associations between observed traits are viewed as due to causal links between traits and by correlations among genetic values (i.e., g1, g2, and g3). Thus, to eliminate confounding problems when inferring the underlying network among traits, we used the approach of Valente et al. (2010) to search for acyclic causal structures through conditional independencies on the distribution of the phenotypes, given the genetic effects. A causal phenotypic network was inferred in two stages: (1) an MTM model (Henderson and Quaas, 1976) was employed to estimate covariance matrices of additive genetic effects and of residuals, and (2) the causal structure among phenotypes from the covariance matrix between traits, conditionally on additive genetic effects, was inferred by the IC algorithm. The residual (co)variance matrix was inferred using Bayesian Markov-chain Monte Carlo (Valente et al., 2010; Wu et al., 2010), with samples drawn from the posterior distribution. The reason for our use of the residual (co)covariances is that the residual structure could bear information from the joint distribution of all phenotypic traits conditional on their polygenic effects, such that they correct the confounding issues caused by such effects when the traits are genetically correlated (Pearl, 2009). For each query testing statistical independence between traits y<sup>t</sup> and y t ′ , the posterior distribution of the residual partial correlation ρy<sup>t</sup> ,y t ′ |h was obtained, where h is a set of variables (traits) that are independent. Three highest posterior density (HPD) intervals of 0.75, 0.85, and 0.95 were used to make statistical decisions for SEM-GWAS. We thus considered SEM-A75 (HPD > 0.75), SEM-A85 (HPD > 0.85), SEM-A95 (HPD > 0.95), and SEM-G75 (HPD > 0.75). An HPD interval that does not contain zero declares y<sup>t</sup> and y t ′ to be conditionally dependent.

## RESULTS

**Figure 1** shows phenotypic relationship structures recovered by the IC algorithm for the three different HPD intervals. Edges connecting two traits represent non-null partial correlations as indicated by HPD intervals. We compared the two MTM-GWAS and four SEM-GWAS by using the three chicken traits (BW, BM, and HHP). Fully recursive (there is at least one incoming OR outgoing edge for each node) SEM-A75 and SEM-G75 graphs revealed direct effects of BM on BW and HHP, and those of BW on HHP, as well as an indirect effect of BM on HHP. In addition, SEM-A85 detected a direct effect of BM on BW, the direct effect of BW on HHP, and the indirect effect of BM on HHP mediated by BW. Finally, SEM-A95 only identified a direct effect of BM on BW because of a statistically stringent HPD cutoff imposed. SEM-G85 and SEM-G95 were not explored further because they produced the same results as SEM-A85 and SEM-A95.

Given the causal structures inferred from the IC algorithm, the following SEM was fitted:

$$\begin{cases} \mathbf{y}\_1 = \mu + \mathbf{Z}\_i \mathbf{g}\_1 + \mathcal{W}\_{ij} s\_{jl} + \varepsilon\_i\\ \mathbf{y}\_2 = \mu + \lambda\_{21} \mathbf{y}\_1 + \mathbf{Z}\_i \mathbf{g}\_2 + \mathcal{W}\_{lj} s\_{jl} + \varepsilon\_i\\ \mathbf{y}\_3 = \mu + \lambda\_{31} \mathbf{y}\_1 + \lambda\_{32} \mathbf{y}\_2 + \mathbf{Z}\_i \mathbf{g}\_3 + \mathcal{W}\_{ij} s\_{jl} + \varepsilon\_i \end{cases} \tag{5}$$

Note that only a small number of the entries in the structural coefficient matrix (λ in Equation 5) are non-zero due to sparsity. These non-zero entries specify the effect of one phenotype on other phenotypes. The corresponding directed acyclic graph is shown in **Figure 2** assuming the causal relationships among the three traits, where y1, y2, and y<sup>3</sup> represent BM, BW, and HHP, respectively; SNP<sup>j</sup> is the genotype of the j-th SNP; sjl is the direct SNP effect on trait l; and the remaining variables are as presented earlier. This diagram depicts a fully recursive structure in which all recursive relationships among the three phenotypic traits are shown. Arrows represent causal connections, whereas double-headed arrows between polygenic effects are correlations.

We examined the fit of each model implemented to assess how well it describes the data (**Table 1**). Varona et al. (2007) and recently Valente et al. (2013) showed that re-parametrization and reduction of a SEM mixed model yield the same joint probability distribution of observation as in MTM, suggesting that the expected likelihood of SEM and MTM should be similar. As expected, SEM-GWAS and MTM-GWAS showed very similar results (e.g., SEM-A75 vs. MTM-A and SEM-G75 vs. MTM-G). Among the models considered, those involving **G** exhibited slightly better fits. SEM-A85 and SEM-A95, sharing a subset of the SEM-A75 structure, presented almost identical AIC and BIC values. Since these results imply that the recursive model and standard mixed model for GWAS are statistically equivalent in terms of the fitting criteria, the focus of the remainder of the analysis will be on the modeling of SNP (or QTL) effects in the SEM context (SEM-A75 or SEM-G75) as an extension of MTM, which accounts for recursive links among the three measured traits.

FIGURE 1 | Causal graphs inferred using the IC algorithm among three traits: breast meat (BM), body weight (BW), and hen-house production (HHP) in the chicken data. SEM-A75 and SEM-G75 were the inferred fully recursive causal structures with HPD > 0.75 and corrected for genetic confounder using A (pedigree-based) and G (marker-based) matrices. SEM-A85 and SEM-A95 were obtained with HPD > 0.85 and HPD > 0.95, respectively, corrected with A. Arrows indicate direction of causal relationships. Dashed lines indicate negative coefficients, and the continuous arrows indicate positive coefficients.

#### Structural Coefficients

**Table 2** presents the causal structural path coefficients for endogenous variables (BM, BW, and HHP). All models have positive effects for BM→ BW, whereas the BM→ HHP and BW→ HHP relationships have negative path coefficients. The latter confirmed the fact that chicken breeding is divided into broiler and layer sections due to the negative genetic correlation between BW and HHP.

Also shown in **Table 2** are the magnitudes of the SEM structural coefficient reflecting the intensity of the causality. The positive coefficient λ<sup>21</sup> quantifies the (direct) causal effect of BM on BW. This suggests that a 1-unit increase in BM results in a λ21-unit increase in BW. Likewise, the negative causal effects λ<sup>31</sup> and λ<sup>32</sup> offer the same interpretation.

#### Decomposition of SNP Effect Paths Using a Fully Recursive Model

We can decompose SNP effects into direct and indirect effects using **Figure 2**. The direct effect of the SNP j on y<sup>3</sup> (HHP) is given by dSNPj→y<sup>3</sup> : Sˆ <sup>j</sup>(y3), where d denotes the direct effect. Note there are only one direct and many indirect paths. We find three indirect paths from SNP<sup>j</sup> to y<sup>3</sup> mediated by y<sup>1</sup> and y<sup>2</sup> (i.e., the nodes formed by other traits). The first indirect effect is ind(1)SNPj→y<sup>3</sup> : λ32(λ21Sˆ j(y1) ) in the path mediated by y<sup>1</sup> and y2, where ind denotes the indirect effect. The second indirect effect ind(2)SNPj→y<sup>3</sup> : λ32Sˆ <sup>j</sup>(y2), is mediated by y2. The last indirect effect, is ind(3)SNPj→y<sup>3</sup> : λ31Sˆ <sup>j</sup>(y1), mediated by y1. Therefore, the overall effect is given by summing all four paths, TSNPj→y<sup>3</sup> : λ32(λ21Sˆ j(y1) ) + λ32Sˆ <sup>j</sup>(y2) + λ31Sˆ <sup>j</sup>(y1) + Sˆ <sup>j</sup>(y3). The fully recursive model of the overall SNP effect is then:

$$\begin{cases} T\_{\hat{\mathbb{S}}\_{j \rightarrow \mathcal{Y}1}} : \hat{\mathbb{S}}\_{j(\mathcal{Y}1)} \\ T\_{\hat{\mathbb{S}}\_{j \rightarrow \mathcal{Y}2}} : \lambda\_{21} \left( \hat{\mathbb{S}}\_{j(\mathcal{Y}1)} \right) + \hat{\mathbb{S}}\_{j(\mathcal{Y}2)} \\ T\_{\hat{\mathbb{S}}\_{j \rightarrow \mathcal{Y}3}} : \lambda\_{32} \left[ \lambda\_{21} \left( \hat{\mathbb{S}}\_{j(\mathcal{Y}1)} \right) + \hat{\mathbb{S}}\_{j(\mathcal{Y}2)} \right] + \lambda\_{31} \left( \hat{\mathbb{S}}\_{j(\mathcal{Y}1)} \right) + \hat{\mathbb{S}}\_{j(\mathcal{Y}3)} \end{cases} \tag{6}$$

For y<sup>1</sup> (BM), there is only one effect, so the overall effect is equal to the direct effect. For y<sup>2</sup> (BW) and y<sup>3</sup> (HHP), direct and indirect SNP effects are involved. There are two paths for



A, pedigree-based relationship matrix, G, VanRaden's marker-based relationship matrix. MTM-A and MTM-G denote MTM-GWAS models coupled with the A and G matrices, respectively. SEM-A75, SEM-A85 and SEM-A95 represent SEM-GWAS models with HPD > 75, 85, and 95% values with the A matrix, respectivly. SEM-G75 is a SEM-GWAS model with HPD > 75 coupled with the G matrix.

TABLE 2 | Estimates of three causal structural coefficients (λ) derived from four different structural models.


BM, breast meat; BW, body weight; HHP, hen-house production. SEM-75: HPD > 0.75. SEM-G75: HPD > 0.75. SEM-A85: HPD > 0.85. SEM-A95: HPD > 0.95. \*\*\*Represents path coefficient was not estimated because there was no corresponding path in the inferred structure.

y2: one indirect, indSj→y<sup>2</sup> : Sˆ <sup>j</sup>(y1) → y<sup>1</sup> → y2, and one direct, dSj→y<sup>2</sup> : Sˆ <sup>j</sup>(y2) → y2. Here, the SNP effect is direct and mediated thorough other phenotypes according to causal networks in SEM-GWAS (**Figures 1**, **2**). For instance, the overall SNP effect for y<sup>3</sup> into four direct and indirect paths is TS<sup>ˆ</sup> j→y3 : λ32λ21Sˆ <sup>j</sup>(y1) +

λ32Sˆ <sup>j</sup>(y1) + λ31Sˆ <sup>j</sup>(y1) + Sˆ <sup>j</sup>(y3).

The scatter plots in **Figure 3** compare the estimated total effects for HHP (TS<sup>ˆ</sup> j→y3 ) obtained from SEM-GWAS and those from MTM-GWAS. We observed good agreement between SEM-GWAS and MTM-GWAS. The total SNP signals derived from SEM and MTM are the same but SEM provides biologically relevant additional information.

**Figures S1**–**S4** present scatter plots of MTM-GWAS and SEM-GWAS signals (SEM-A75, SEM-G75, SEM-A85, and SEM-A95) for the BM → BW path, which was a common path across all SEM-GWAS considered. These two traits have a genetic correlation of 0.5 (results not shown). We partitioned the SEM causal link into direct, indirect, and overall effects based on directed links inferred from the IC algorithm with HPD > 0.85, whereas MTM-GWAS captures an overall SNP effect on BW. Scatter plots of the overall effects from SEM-GWAS and those of the total effects from MTM-GWAS indicated almost perfect agreement (top left plots, **Figures S1**–**S4**). We also observed concomitance between estimated overall and direct effects (top right plots, **Figures S1**–**S4**). In contrast, there was less agreement in the magnitude of the SNP effects when comparing overall vs. indirect effects (bottom left plots, **Figures S1**–**S4**). There was no linear relationship between the indirect and direct SNP effects (bottom right plots, **Figures S1**–**S4**). In short, genetic signals detected in SEM-GWAS were close to those of MTM-GWAS for overall effects because both models are based on a multivariate approach with the same covariance matrix. In all SEM-GWAS, results showed that direct effects contributed to overall effects more than the indirect effects.

### Manhattan Plot of Direct, Indirect, and Overall SNP Effects

**Figure 4** depicts a Manhattan plot summarizing the magnitude of direct (SEM-75A), indirect (SEM-75A), and overall SNP effects (MTM-75A). We plotted the decomposed SNP effects on BW along chromosomes to visualize estimated marker effects from SEM-GWAS and MTM-GWAS. The indirect and direct effects provide a view of SNP effects from a perspective that is not available for the total effect of MTM-GWAS. For instance, there were two estimated SNP effects on chromosomes 1 and 2 that deserve particular attention. These two SNPs are highlighted with black circles and red ovals. The overall effect of the first SNP consisted of large indirect and small direct effects on BM, whereas the opposite pattern was observed for the second SNP, which showed large direct and small indirect effects. Although the overall effects of these SNPs were similar (top Manhattan plot, **Figure 4**), use of decomposition allowed us to determine that the trait of interest is affected in different manners: the second SNP effect acted directly on BW without any mediation by BM, whereas the first SNP reflected a large effect mediated by BM on BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

The corresponding Manhattan plot based on –log<sup>10</sup> (p-values) is shown in **Figure S5**. As with the magnitude of effect sizes, the results showed that –log<sup>10</sup> (p-values) of estimated overall effects from SEM-A75 and those from MTM-A75 yielded the same significant peaks. We found that some significant indirect SNP effects reached genome-wide significance after correction for multiple-testing using a 5% FDR threshold level (2.752). The most significant SNPs were on chromosomes 1 and 4 (GGA1 and GGA4).

As an illustration, the six most significant SNPs with the highest –log<sup>10</sup> (p-values) for each type of decomposed SNP effect are presented in **Table 3**. Seven candidate genes were identified near the significant SNPs derived from the SNP effects decomposition, with two on GGA7 (OLA1 and ZNF385B), one on GGA3 (EPHA7), three on GGA4 (LOC422264, LOC422265, and MAEA), and one on GGA14 (GRIN2A). We found that only genes on GGA4 and GGA1 are linked to significant indirect SNP effects that impact HHP. Some studies reported QTLs for BM on GGA1 and for BW on GGA4, stating that these genomic regions contain QTLs related to abdominal fat and growth traits that were detected across

diverse chicken populations (Sun et al., 2013; Van Goor et al., 2015). One of the two detected genes on GGA14, i.e., GRIN2A, which was linked to the SNP Gga\_rs313620413, showed significant direct and overall SNPs effects using SEM as well as MTM. Collectively, Gga\_rs15390496, Gga\_rs16591372, and Gga\_rs313620413 SNPs on GGA3, GGA7, and GGA14, which were linked to EPHA7, OLA1, and GRIN2A, respectively, represent candidate genes identified from overall effects of both SEM and MTM (**Table 3**).

We noted that the six SNPs selected according to the –log<sup>10</sup> (p-values) from the direct effect on HHP (i.e., dSNPj→y(HHP) ) had small indirect effects ranging from −0.9018 to 0.2983. These indirect effects were negligible compared with their corresponding direct and total effects. Also, exploring the indirect effect sizes of the six most significant SNPs showed that indirect effects that are transmitted through inferred causal networks have the ability to change the magnitude of overall SNP effects, even changing them to the opposite direction (i.e., from positive to negative or vice versa).

It should also be noted that the estimated additive SNP effects obtained from the four SEM-GWAS can be used for inferring pleiotropy. For instance, a pleiotropic QTL may have a large


TABLE 3 | Six most significant SNPs selected according –log10 (p-values) and their effects, using the full recursive SEM (SEM-A75) and MTM (MTM-A75).

dSj→y(HHP) , indSj→y(HHp) , TSj→y(HHP) and MTMSj→<sup>y</sup> (HHP) , represents, direct, indirect and overall from SEM and MTM effects of j-th SNP on HHP. The bold values are –log<sup>10</sup> (corrected p-value) for each type of significant SNP effects categories.

positive direct effect on BW but may exhibit a negative indirect effect coming from BM, which in turn reduces the total QTL effect on BW. Arguably, the methodology employed here would be most effective when the direct and indirect effects of a QTL are in opposite directions. If the direct and indirect QTL effects are in the same direction, the power of SEM-GWAS may be the same as the overall power of MTM-GWAS. The overall effect (TS<sup>ˆ</sup> j→y (HHP) ) of a given SNP consisted of large indirect (indS<sup>ˆ</sup> <sup>j</sup>→y(HHP) ) and small direct (dS<sup>ˆ</sup> <sup>j</sup>→y(HHP) ) effects on HHP, as observed for the top most significant indirect SNPs localized on GGA4 and GAA1, whereas the opposite pattern was observed for the most significant direct SNPs on GAA3, GGA7, and GGA14, which showed large direct and small indirect effects. Although the overall effects of these SNPs from SEM-GWAS and MTM-GWAS were similar, the use of decomposition allowed us to determine that the trait of interest is affected in different manners. For instance, a given SNP effect may largely act directly on HHP without any mediation by BM and BW, whereas another SNP may be transmitting a large effect through a causal path mediated by BM and BW. Collectively, new insight regarding the direction of SNP effects can be obtained using the SEM-GWAS methodology.

#### DISCUSSION

It is becoming increasingly common to analyze a set of traits simultaneously in GWAS by leveraging genetic correlations between traits (Gao et al., 2014; Wu and Pankow, 2017). In the present study, we illustrated the potential utility of a SEM-based GWAS approach for causal inference and mediation analysis of SNP effects, which has the potential advantage of embedding a pre-inferred causal structure across phenotypic traits (Valente et al., 2010). SEM-GWAS, as an extension of standard MTM, accounts for recursive linking of mediating variables that could be either dependent or independent with restriction on a residual covariance. This is a useful approach when multiple mediators influence the final outcomes via either common or distinct biological pathways (Barfield et al., 2017; Bellavia and Valeri, 2017). SEM-GWAS is achieved by first inferring the structure of networks between phenotypic traits. For this purpose, we used a modified version of the IC algorithm described by Pearl (2009) and modified for implementing in quantitative genetics by Valente et al. (2010). The IC algorithm was used to explore putative causal links among phenotypes obtained from a residual covariance matrix, in a model that accounted for systematic

and genetic confounding factors such as polygenic additive effects. It then produced a posterior distribution of partial residual correlations between any possible pairs of variables. Three different causal path diagrams were inferred from HPD intervals of 0.75, 0.85, and 0.95. We observed that the number of identified paths decreased with an increase in the HPD interval value. Only a path connecting BM and BW was present in all HPD intervals considered. Moreover, we found that the partial residual correlation between BM and HHP was weaker than that between BM and BW. This may explain why the path between BM and HHP was not detected with HPD intervals larger than 0.75.

The primary purpose of estimating the goodness of fit criterions was to determine whether full recursive SEM and MTM models with different assumptions yield the same or nearly the same BIC and AIC scores. Because our results showed that SEM and MTM produced nearly the same goodness of fit criterions, we conclude that the essential difference between these models cannot be articulated in terms of an expressive power of joint distributions or goodness of fit (Valente et al., 2013).

Estimated path coefficients reflect the strength of each causal link, quantifying the proportion of direct and indirect effects of a given SNP or genes on the outcome of interest via the mediator phenotypic traits or the predefined causal pathway between a set of mediators and the target outcome. For instance, a positive path coefficient from BM to BW suggests that a unit increase in BM directly results in an increase in BW. Our results showed that MTM-GWAS and SEM-GWAS were similar in terms of the goodness of fit as per the AIC and BIC criteria. This finding is in agreement with theoretical work of Gianola and Sorensen (2004) and Varona et al. (2007)showing the equivalence between models. Thus, MTM-GWAS and SEM-GWAS produced the same marginal phenotypic distributions and goodness of fit values. A similar approach has been proposed by Li et al. (2006), Mi et al. (2010b), and Wang and van Eeuwijk (2014). The main difference between our approach and theirs is that they used SEM in the context of standard QTL mapping, whereas our SEM-GWAS is developed for GWAS based on a linear mixed model.

The results obtained in this study using the three economic traits in chickens suggest that causal inference and the SEM framework can be used for a set of phenotypes by considering both the raw and partial correlation relationships among traits in breeding programs. For example, in model SEM-A85, BM and HHP are unconditionally independent. However, conditioning on BW results in a non-zero partial correlation. Conditioning on BW breaks the causal chain from BM to HHP as observed in the case of full recursive models (SEM-A75 and SEM-G75) and their partial correlation becomes non-zero. This indicates that when all three variables are causally connected, both raw and partial correlations will all be non-zero, but they will change the magnitude depending on the signs of the path coefficients.

The advantage of SEM-GWAS over MTM-GWAS is that the former decomposes SNP effects by tracing inferred causal networks. Our results showed that by partitioning SNP effects into direct, indirect, and total components, an alternative perspective of SNP effects can be obtained. As shown in **Table 3** and **Figure 4**, direct and indirect effects may differ in magnitude and sign, acting in the same direction or in an antagonistic manner. Note that the total SNP effects inferred from SEM-GWAS were the same as the estimated SNP effects from MT-GWAS (**Table 3**). However, knowledge derived from the decomposition of SNP effects may be critical for animal and plant breeders in breaking unfavorable indirect QTL effects by reducing the frequency of undesired alleles or obtaining better SNP effect estimates than those from MTM-GWAS (e.g., Mi et al., 2010b).

## CONCLUSION

SEM offers insights into how phenotypic traits relate to each other. We illustrated potential advantages of SEM-GWAS relative to the commonly used standard MTM-GWAS by using three chicken traits as an example. SNP effects pertaining to SEM-GWAS have a different meaning than those in MTM-GWAS. Our results showed that SEM-GWAS enabled the identification of whether a SNP effect is acting directly or indirectly, i.e., mediated, on given trait. In contrast, MTM-GWAS only captures overall genetic effects on traits, which is equivalent to combining direct and indirect SNP effects from SEM-GWAS together. Thus, SEM-GWAS offers more information and provides an alternative view of putative causal networks, enabling a better understanding of the genetic quiddity of traits at the genomic level.

## AUTHOR CONTRIBUTIONS

MM carried out the study and wrote the first draft of the manuscript. GR and DG designed the experiment, supervised the study, and critically contributed to the final version of manuscript. GM contributed to the interpretation of results, provided critical insights, and revised the manuscript. BV and AA participated in discussion and reviewed the manuscript. MA, AK, and RM contributed materials and revised the manuscript. All authors read and approved the final manuscript.

## ACKNOWLEDGMENTS

MM wishes to acknowledge the Ministry of Science, Research and Technology of Iran for financially supporting his visit to the University of Wisconsin-Madison. Work was partially supported by the Wisconsin Agriculture Experiment Station under hatch grant 142-PRJ63CV to DG.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00455/full#supplementary-material

## REFERENCES


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Momen, Ayatollahi Mehrgardi, Amiri Roudbar, Kranis, Mercuri Pinto, Valente, Morota, Rosa and Gianola. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Parameters for Resistance to Non-specific Diseases and Production Traits Measured in Challenging and Selection Environments; Application to a Rabbit Case

Mélanie Gunia<sup>1</sup> \*, Ingrid David<sup>1</sup> , Jacques Hurtaud<sup>2</sup> , Mickaël Maupin<sup>2</sup> , Hélène Gilbert <sup>1</sup> and Hervé Garreau<sup>1</sup>

#### Edited by:

Andrea B. Doeschl-Wilson, Roslin Institute, University of Edinburgh, United Kingdom

#### Reviewed by:

Allan Schinckel, Purdue University, United States Adriana Santana Carmo, Deoxi Biotecnologia, Brazil

> \*Correspondence: Mélanie Gunia melanie.gunia@inra.fr

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 23 April 2018 Accepted: 24 September 2018 Published: 16 October 2018

#### Citation:

Gunia M, David I, Hurtaud J, Maupin M, Gilbert H and Garreau H (2018) Genetic Parameters for Resistance to Non-specific Diseases and Production Traits Measured in Challenging and Selection Environments; Application to a Rabbit Case. Front. Genet. 9:467. doi: 10.3389/fgene.2018.00467 <sup>1</sup> GenPhySE, INRA, ENVT, Université de Toulouse, Castanet Tolosan, France, <sup>2</sup> HYPHARM SAS, La Corbière, Roussay, Sèvremoine, France

Breeding for disease resistance is a challenging but increasingly necessary objective to overcome the issues with the reduced use of antibiotics and growing concern for animal welfare while limiting economic losses. However, implementing such strategies is a complex process because animals face numerous diseases, and the environments on selection farms differ from those on commercial farms. We evaluated whether selection for resistance to non-specific diseases based on a single visual record in selection (S) and challenging (Ch) environments is possible. Records from 23,773 purebred rabbits born between 2012 and 2016 were used in this study. After weaning (at 32 days of age), 17,712 rabbits were raised in the S environment and 6,061 sibs were raised in the Ch environment. Clinical signs of disease were recorded for all animals at the end of the test, at a single time point, at 70 or 80 days of age. The causes of mortality occurring before the end of the test were also recorded. Three disease traits were analyzed: signs of respiratory disease, signs of digestive disease, and a composite trait (Resist) taking into account signs of digestive, respiratory and various infectious diseases. This latter composite trait is proposed to capture the global resistance to disease. All disease traits were binary, with 0 being the absence of symptoms. Two production traits were also recorded: the number of kits born alive (4,121 litters) and the weaning weight (13,090 rabbits). Disease traits were analyzed with animal threshold models, assuming that traits are different in the two environments. Bivariate analyses were carried out using linear animal models. The heritabilities of the disease traits ranged from 0.04 ± 0.01 to 0.11 ± 0.03. The genetic correlations between disease traits in both environments were below unity (≤ 0.84), indicating genotype by environment interactions. Most of the genetic correlations between disease and production traits were not significantly different from

**82**

zero, except between the weaning weight and Resist\_S, with a favorable correlation of −0.34 ± 0.12. Given these genetic parameters, for the same level of exposure of rabbits to pathogens, the expected response to selection is a reduction of disease incidence of 4–6% per generation.

Keywords: heritability, genetics, resistance to disease, farming, rabbit, genetic parameters, genotype-environment interaction

#### INTRODUCTION

Breeding for disease resistance is becoming increasingly important to reduce the use of antibiotics and address the growing concern for animal welfare. It also contributes to reducing production costs at both the selection and commercial levels (Phocas et al., 2016). Improving disease resistance by selection is challenging. During their lifetime, animals can face various pathogens, many of which are not always identified. In addition, when the selection environment differs considerably from commercial environments (i.e., the higher biosecurity level of selection environments entails a lower expression of disease) little or no selection pressure is applied on this trait. To implement such selection, there is still a need for phenotypes that can be easily measured, at a reasonable cost (Merks et al., 2012). In rabbits, previous studies have shown that simple health records, measured once on growing animals of the selection nucleus, can be used to improve disease resistance (Eady et al., 2007; Garreau et al., 2012; Gunia et al., 2015). However, it is not known if such selection will be beneficial for maintaining the health and productivity of animals reared in commercial conditions. To address this issue, disease symptoms were recorded in rabbits at Hypharm's facilities in both a selection environment and more challenging environments. The aim of this study was to determine whether selection for resistance to non-specific diseases is possible, and how records from different environments can improve the genetic gains. We first estimated the genetic parameters of disease resistance traits in two contrasting environments and their genetic and phenotypic correlations with the main production traits. Then, we assessed the expected genetic progress for disease resistance for various selection strategies, including records from different environments in the genetic evaluations.

#### MATERIALS AND METHODS

This study was carried out in accordance with the national regulations of agriculture in the framework of the selective breeding schemes of the Hypharm breeding company.

#### Animals

Data were collected for animals of the AGP77 maternal rabbit line (Hypharm, Roussay, France). Records from 23,773 purebred rabbits born between 2012 and 2016 were analyzed. This line was created from rabbits of the New Zealand breed in 1975. Animals have been selected for litter size and weaning weight (direct and maternal effects) since 2002. In our dataset, does were inseminated every 42 days and kits were weaned at 32 days of age. All rabbits were born and weaned on a nucleus farm, i.e., in a highly bio-secure and controlled environment, where all candidates for selection were further tested. This farm is hereafter referred to as the selection environment (S). After being weighed at weaning, these rabbits were reared in two different environments: (1) the S environment or (2) sibtesting farms, which were 3 farms with less favorable sanitary conditions than the nucleus farm. The sib-testing farms are hereafter referred to as the challenging environment (Ch). Some full sibs and half sibs of the candidates for selection were tested in Ch. The aim of the challenging environment was to mimic the less protected environmental conditions encountered on some commercial rabbit farms. Farms were semi-open rabbit farms with no artificial heating, cooling or ventilation systems. Rabbits of various age classes were reared in the same rooms, and the frequency of veterinary treatments was kept to a minimum. Sick rabbit groups were treated with water medication according to veterinary requirements. Rabbits (mostly males) from every second weaning batch were sent to the Ch environment at weaning. In total, 6,061 rabbits had health records in the Ch environment (5,864 males and 197 females) and 17,712 in the S environment (5,499 males and 12,213 females). The pedigree included 332 sires and 849 dams. A total of 228 sires had kits with health records in both environment, 29 sires in S only and 1 sire in Ch only.

## Traits

Clinical signs of diseases occurring naturally on farms were recorded at a single time point, at the end of the test at 70 or 80 days of age. Very mild clinical signs of diseases were recorded. The most likely cause of death was also recorded after necropsy for rabbits that died between weaning and the end of the test. Disease traits were coded as 0 (absence) or 1 (disorder = morbidity at the end of the test or mortality between weaning and the end of the test). Clinical sign of diseases were not recorded between weaning and the end of the test. Rabbits categorized at healthy (0) at 70 or 80 days of age might have been sick individuals who had recovered. Disorders were further divided into the following categories: (1) digestive disease (Dig), which included diarrhea, bloated abdomen, and any form of digestive symptoms, (2) respiratory disease (Resp) which included nasal discharge, lung lesions, eye infection, wry neck, and (3) non-specific diseases (Resist), which combined Dig, Respi, abnormally low weight, and other clinical signs of infectious origin. The disease traits were treated as separate traits depending on the environment (S or Ch), resulting in a total of 6 disease traits: Dig\_S, Resp\_S, Resist\_S, Dig\_Ch, Resp\_Ch,

TABLE 1 | Total prevalence (in %) of non-specific diseases (Resist),

respiratory (Resp) and digestive (Dig) diseases in selection (S) and challenging (Ch) environments from 2012 to 2016<sup>1</sup> .


<sup>1</sup>N = 17,712 in S and N = 6,061 in Ch.

<sup>2</sup>Very mild clinical signs of a disease were taken into account.

TABLE 2 | Number of records (N), mean and standard deviation (Std) for the number of kits born alive (NBA) and weaning weight (WW).


<sup>1</sup>Number of litters, <sup>2</sup>Number of rabbits.

Resist\_Ch. The production traits were the number of kits born alive per doe (NBA) and the weaning weight (WW), which were exclusively recorded in the S environment. Descriptive statistics of the disease and production traits are listed in **Tables 1**, **2**.

#### Genetic Parameters Analyses

All traits were analyzed using a restricted maximum likelihood method, with the ASReml 3.0 software (Gilmour et al., 2009). Variance components and heritabilities were estimated using single-trait animal threshold models with a logit link function for the binary disease traits and multiple trait linear animal models for the production traits. Genetic and phenotypic correlations between traits were estimated using multiple trait linear animal models. The models included a random additive polygenic effect for all traits, a random common litter effect for the disease resistance traits and WW, a random maternal environmental effect and a random maternal genetic effect for WW, and a permanent environmental effect to account for the repeated measurement of NBA for does. For the 4% of kits cross-fostered at birth to another doe, the maternal genetic effect, the common litter effect, and the maternal environmental effects were assigned to the adoptive suckling mother. The significance of the fixed effects was determined for each trait using the Wald F statistic, which is similar to an ANOVA (Gilmour et al., 2009). Fixed effects were first tested together, and then a stepwise selection of the significant effects was applied. Significant fixed effects (P < 0.05) were maintained in subsequent analyses (**Table 3**). They were: batch and sex for the disease traits and WW, parity of the dam for WW, and a Ch farms effect for the disease traits measured in Ch. The combined effects of year-season of kitting and parity-physiological status (lactating or not at insemination) were applied for NBA.

To estimate genetic correlations, we used analysis methods for continuous data, which are not theoretically optimal. The suitable methodology is the threshold model (Gianola, 1982). However, assumption of a continuous distribution for these TABLE 3 | Fixed effects included in the models (x), not significant (NS), or not tested (-) for non-specific diseases (Resist), respiratory (Resp), and digestive (Dig) disease in selection (S) and challenging (Ch) environments, for weaning weight (WW), and number of kits born alive (NBA).


<sup>1</sup>Lactating or not at insemination.

traits is justified for genetic evaluation and for estimates of genetic correlations with continuous traits (Kadarmideen et al., 2003). Several studies showed that the estimates of heritability or breeding values from linear and threshold models are highly correlated (Matos et al., 1997a,b; Ramirez-Valverde et al., 2001). The difference between these methodologies has been shown to be negligible when the incidence of the binary response was between 25 and 75% (Meijering and Gianola, 1985). Except in the case where fixed effects were added, nested models were compared using the restricted likelihood ratio test. When the model comparison corresponded to a test of parameter on the boundary of parameter space (test of variance different from 0 or test of correlation different from 1), the distribution of this test statistic under the null hypothesis is a 50:50 mixture of χ 2 q and χ 2 q+1 distributions (Morrell, 1998), where, q is the number of random effects in the reduced model (residual effect excluded). To obtain the standard error of the heritability, we performed a multivariate sampling approach of the variance components as described by Houle and Meyer (2015) 10,000 times and computed heritability for each sample. Standard deviation of the sampling was then used as an estimation of the standard error of the heritability. In addition, we checked the normality of the distribution obtained. If the hypothesis of normality was not rejected, we then used a Student's t-test to compare heritabilities obtained in the S and Ch environments.

## Simulation of Breeding Schemes

To illustrate the genetic progress that could be obtained for nonspecific disease resistance, we tested various breeding schemes including Resist\_S or Resist\_Ch.

#### General Parameters

We used the deterministic simulation program SelAction 2.2 (Rutten et al., 2002) to compare the expected selection responses for breeding schemes including resistance to non-specific disease. SelAction predicts responses to selection on pseudo-BLUP estimated breeding values. In our study, we used the option of discrete generations and 1-stage selection. We simulated a selection nucleus of 140 dams and 35 sires. We assumed 7

progeny per litter with a sex ratio of ½. The selection intensity was 15% for males and 25% for females. Rabbits were selected at 70 days of age. The genetic parameters used for the simulation were those obtained in the first part of the study and the variance components obtained with a linear model for Resist\_S and Resist\_Ch.

#### Alternative Breeding Schemes

We compared expected selection responses for fictive breeding objectives including disease resistance in different ways. The four tested breeding objectives were:

HResist\_S\_Ch = 3 × ANBA + 0.15 × AWW\_direct + 0.15 × AWW\_maternal - 65 × AResist\_S - 65 ×AResist\_Ch HResist\_S = 3 × ANBA + 0.15 × AWW\_direct + 0.15 × AWW\_maternal - 130 × AResist\_S HResist\_Ch = 3 × ANBA + 0.15 × AWW\_direct + 0.15 × AWW\_maternal - 130 × AResist\_Ch HProduction = 3 × ANBA + 0.15 × AWW\_direct + 0.15 × AWW\_maternal

with Ax denoting the true breeding value for trait X. The traits NBA was expressed in number of kits, WW\_direct and WW\_maternal in g, and Resists\_S and Resist\_Ch in %. The corresponding weights were given in euros per physical unit of the traits.

The weights did not have any economic meaning, as they were derived by trials and errors based on the desired gain methodology (Brascamp, 1984). Pre-trial simulations with SelAction (Rutten et al., 2002) based on a breeding objective including all traits were run to estimate the genetic gain for various sets of weights. Then, a set of weights was arbitrarily chosen to improve non-specific disease resistance in both S and Ch, while increasing the weaning weight and maintaining a stable NBA, resulting in the HResist\_S\_Ch breeding objective presented above.

For comparison reason, we also assessed the selection response for a breeding objective including only production traits (HProduction) by using the same weights on the production traits. We studied the indirect expected selection response for each trait, even though it was excluded from the breeding objective, as for instance response in Resist\_S and Resist\_Ch, when selection occurred for HProduction.

For each breeding objective including Resist\_S or Resist\_Ch, Resist was recorded with one or the other of the following modalities:


For HProduction, we considered that disease resistance traits were not recorded.

#### Correlations Between Breeding Objectives

The correlations between breeding objectives were calculated as:

$$\begin{aligned} \text{Covariance} \left( H\_i, H\_j \right) &= \; W\_i^T \times \; \text{Var} \left( A\_{i,j} \right) \times \; W\_j\\ \text{Correlation} \left( H\_i, H\_j \right) &= \frac{\text{Covariance} \left( H\_i, H\_j \right)}{\sqrt{\sigma\_{H\_i}^2 \times \sigma\_{H\_j}^2}} \end{aligned}$$

with **W**<sup>T</sup> i being the row vector (1 × n) of the weights of the breeding objective H<sup>i</sup> , **W**<sup>j</sup> the column vector (m ×1) of the weights of the breeding objective H<sup>j</sup> , Var(**A**<sup>i</sup> **,**j) the genetic variance-covariance matrix (n × m) of the traits in the breeding objective H<sup>i</sup> and H<sup>j</sup> , and σ²Hi the variance of the breeding objective.

#### RESULTS

#### Phenotypes

The phenotypes analyzed are shown in **Tables 1**, **2**. The moderately high disease prevalence over the test period (26% for Resist\_S and 41% for Resist\_Ch) reflects the accurate recording of even the slightest clinical signs of disease. Disease prevalence was higher in Ch than S: 5 percentage points for Dig, 10 percentage

TABLE 4 | Estimates of variance components, heritabilities, common litter effect, and genetic correlations between selection (S) and challenging (Ch) environments for non-specific disease (Resist), respiratory (Resp), and digestive (Dig) disease ( ± standard errors).


σ 2 a , direct genetic variance; σ 2 com.litter, common litter effect variance; σ 2 e , residual variance; σ 2 p , phenotypic variance; h<sup>2</sup> , direct heritability; c<sup>2</sup> com.litter, common litter effect; rg, genetic correlation. Variance components; heritability and common litter effect were estimated with a single-trait animal threshold model (logit transformation). Genetic correlations between S and Ch were estimated with a two-trait linear animal model.

<sup>1</sup>Genetic correlation values in bold type are significant different form one at P < 0.001.


TABLE 5 | Genetic correlations (above diagonal) and phenotypic correlations (below diagonal) for non-specific disease (Resist), respiratory (Resp), and digestive disease (Dig) in selection (S) and challenging (Ch) environments (± standard errors).

Correlations were estimated with two-trait linear animal models.

Values in bold type are significantly different form zero at P < 0.01.

points for Resp, and 15 percentage points for Resist. The average NBA was 9.92 ± 3.34 kits, and WW was 664 ± 102 g.

#### Genetic Parameters of Disease Resistance Traits in Selection and Challenging Environments

The genetic parameters of the disease traits are provided in **Table 4**. The genetic correlations between environments for each disease trait are given in **Table 5**. The heritabilities of the disease traits were low, ranging from 0.04 ± 0.01 (Resist\_S) to 0.11 ± 0.03 (Dig\_Ch). Heritabilities tended to be higher in Ch than in S. However, the difference was not significantly different from zero. The common litter effect also tended to be higher in S (0.05 ± 0.01) than in Ch (0.01 ± 0.01). The genetic and phenotypic correlations between Resist on the one hand, and Dig and Resp on the other hand, were moderate to high, ranging from 0.52 ± 0.13 to 0.74 ± 0.08, and were similar in both environments (**Table 5**). The phenotypic correlation between Resp and Dig was moderate and negative, while the genetic correlation was negative but not significantly different from zero. The genetic correlations between Ch and S for each disease resistance trait were below unity, demonstrating significant interaction between the genotype and the environment for Resist and Dig, but not for Resp (**Table 4**). The genetic correlation between S and Ch environments was higher for Resp (0.84 ± 0.12) than for Dig (0.48 ± 0.16), the estimate for the composite trait Resist being intermediate (0.70 ± 0.13).

#### Genetic Parameters of Production Traits and Correlations With Disease Traits

The variance component estimates of the production traits are shown in **Table 6** and the correlations with the disease resistance traits in **Table 7**. The heritability of NBA was low (0.16 ± 0.03); the direct heritability of WW was moderate (0.29 ± 0.04) and its maternal heritability was very low (0.05 ± 0.02). The phenotypic correlation between NBA and WW was null. The genetic correlation between NBA and direct effects for WW was negative but not significantly different from zero. On the contrary, the genetic correlation between NBA and WW\_maternal was moderate and positive (0.51 ± 0.16). The phenotypic correlations between Resist\_S or Resist\_Ch and the production traits were negative and low to moderate. Most of the genetic correlations between Resist\_S or Resist\_Ch and the production traits were not significantly different from 0, except for Resist\_S and WW\_direct which was negative (i.e., favorable).

TABLE 6 | Estimates of variance components, heritabilities, common litter effect, permanent environment effect for the number of kits born alive (NBA) and weaning weight (WW) (± standard errors).


σ 2 a , direct genetic variance; σ 2 com.litter, common litter effect variance; σ 2 mat.env, maternal environment variance; σ 2 perm.env, permanent environmental variance; σ 2 <sup>m</sup>, maternal genetic variance; σ 2 e , residual variance; σ 2 p , phenotypic variance; h<sup>2</sup> , direct heritability; c<sup>2</sup> com.litter, common litter effect; c<sup>2</sup> mat.env, maternal environment effect; m<sup>2</sup> , maternal heritability; repeatability, (σ 2 <sup>a</sup> + σ 2 perm.env)/ σ 2 p . All variance component were estimated with three-trait linear animal models.

#### Expected Genetic Gain

We compared the selection response for four breeding objectives including Resist\_S, Resist\_Ch, or both, or only production traits. The breeding objectives were highly correlated, with correlations greater than or equal to 0.75 (**Table 8**). The correlation was very high between the breeding objectives with Resist\_S or Resist\_Ch on the one hand and the breeding objective including both traits on the other hand (0.93).

The expected genetic gain is provided in **Table 9** for the four breeding objectives and the two recording modalities for Resist\_Ch. On average across scenarios, the expected genetic gain per generation expressed in genetic standard deviation was −0.45 for Resist\_S, −0.27 for Resist\_Ch, −0.02 for NBA, 0.44 for WW\_dir, and −0.19 for WW\_mat. The selection response was very stable for Resist\_S across the breeding schemes that included Resist\_S or Resist\_Ch in the breeding objective, with values ranging from −4.5 to −4.8%. The selection response increased for Resist\_Ch along with the weight given to this trait in the breeding objective. The highest selection response for both Resist\_S and Resist\_Ch was obtained for the breeding


TABLE 7 | Genetic correlations (above diagonal) and phenotypic correlations (below diagonal) for non-specific diseases (Resist) in selection (S) or challenging (Ch) environments, number of kits born alive (NBA), and the direct and maternal effects of weaning weight (WW) (±standard errors).

Correlations were estimated with three-trait linear animal models.

Values in bold type are significantly different form zero at P < 0.05.

TABLE 8 | Correlations between the breeding objectives<sup>1</sup>


.

<sup>1</sup>HProduction, breeding objective including the direct and maternal component of weaning weight and the number of kits born alive.

HResist\_S, HProduction + Resistance for non-specific diseases in the selection environment

HResist\_Ch, HProduction + Resistance for non-specific diseases in the challenging environment

HResist\_S\_Ch, HProduction + Resistance for non-specific diseases in the selection and in the challenging environment.

scheme Production + Resist\_Ch with records for Resist\_Ch, with −4.8% for Resist\_S and −5.1% for Resist\_Ch. For each breeding objective including a disease resistance trait, we quantified the expected genetic gain obtained using records from the challenging environment. Recording Resist in the challenging environment (on a quarter of the sibs of the selection candidates) always improved the selection response for Resist\_Ch for all the breeding objectives. This additional genetic progress led to a reduction of disease incidence for Resist\_Ch of 1 percentage point per generation (on average across scenarios), which represents an additional genetic gain of 25% (compared with the scenarios where Resist\_Ch was not recorded).

The correlated selection response for disease resistance traits was low but still favorable when these traits where not included in the breeding objective and not recorded (−2.6% for Resist\_S and −0.6% for Resist\_Ch with HProduction). HProduction had the highest genetic gain for WW\_direct (35 g), but the lowest for all the other traits. Unfavorable trends were obtained for WW\_mat with all scenarios (−4.3 g on average), but they were lower (−2.2 g) for the scheme HResist\_Ch with records for Resist\_Ch. However, there was less genetic gain for WW\_dir with this scenario (16.5 g). NBA was also more stable for this scenario with a very slight increase of the trait (0.005 kits).

#### DISCUSSION

Improving resistance to non-specific diseases seems to be possible in both selection and more challenging environments. Resistance to non-specific diseases is a heritable trait in both environments.

### Breeding for Disease Resistance or Tolerance to Non-specific Diseases?

In this study, we considered that an animal was affected with a disease if it showed at least one clinical symptom of infection at a single time point. Such observations of animals under normal or more challenging production conditions are a simple and direct approach in order to select for genetic resistance. However, the expression of resistance to disease is questionable (Rothschild, 1998). Sick animals without symptoms may have been categorized as healthy (poor sensitivity) while healthy rabbits or recovering rabbits may have been categorized as sick animals (poor specificity). This could lead to an underestimation of heritability (Bishop and Woolliams, 2010). The ≪ true ≫ heritability of disease resistance is likely to be higher than our estimates. Nevertheless, the measure proposed here for Resist is simple, easy to record, can be routinely collected on farms, and seems to be a good proxy for improving the resistance and tolerance to the most common diseases faced by animals in various production conditions. No experimental challenges are required and no particular disease is given priority over another. As emphasized by various authors (Guy et al., 2012; Merks et al., 2012), the main issue for effective genetic selection for disease resistance is the identification of phenotypes that can be easily measured and routinely collected on farms. In our manuscript, we use the terms "selection for disease resistance" in a very general way to qualify the genetic improvement of rabbit health. However, it may include both resistance and tolerance to disease. Host resistance refers to the ability to reduce pathogen replication within a host, whereas, tolerance refers to the ability to reduce the impact of pathogens on host performance without necessarily affecting the pathogen burden (Doeschl-Wilson and Kyriazakis, 2012). Tolerance can also be more broadly assessed against abiotic factors (temperature) or production diseases (Kause and Ødegård, 2012). Our phenotypes were based on observed clinical signs at a definite time. They included no information about the presence of pathogens and the infection dynamic, if any. Recording the pathogen burden seems unfeasible on farms due to the high number of pathogen types and the cost of such analyses when trying to limit production costs. The healthy phenotype in our study could therefore be an expression of resistance (the animals succeed

TABLE 9 | Expected direct selection responses or correlated responses to selection per generation (10 months) in trait unit for non-specific disease resistance (Resist) in the selection (S) or challenging (Ch) environment, for number of kits born alive (NBA), and for the direct and maternal components of weaning weight (WW) for four alternative breeding objectives<sup>1</sup> including or not records for Resist\_Ch.


<sup>1</sup>HProduction, breeding objective including the direct and maternal component of weaning weight and the number of kits born alive.

HResist\_S, HProduction + Resistance for non-specific diseases in the selection environment.

HResist\_Ch, HProduction + Resistance for non-specific diseases in the challenging environment.

HResist\_S\_Ch, HProduction + Resistance for non-specific diseases in the selection and in the challenging environment.

<sup>2</sup>NBA, WW\_direct and WW\_maternal are the only traits recorded for this breeding scheme.

to reduce pathogen replication), tolerance to pathogens (the animal is a carrier of the pathogen but maintains its level of performance without showing clinical signs), or tolerance to metabolic disorders (e.g., noninfectious digestive disorders), or result from the absence of contact with pathogens. It has been argued that disease resistance mechanisms are often pathogenspecific, while tolerance mechanisms that prevent or repair damage may be more host than pathogen specific, and may thus offer generic protection for a range of pathogens (Doeschl-Wilson and Kyriazakis, 2012). In our case, we may be improving tolerance, and we may also be improving resistance by selecting animals with a more efficient innate immune response. As observed by Glass (2012), "distinct host resistance and tolerance traits may be less common than traits that involve elements of both strategies which are likely to have evolved together to overcome infectious threats."

## Heritability of Resistance to Non-specific Diseases

The heritability of the disease traits was low. Similar heritabilities were found in French paternal rabbit lines (Gunia et al., 2015) with heritabilities ranging from 0.030 ± 0.003 to 0.041 ± 0.004 for disease traits on the underlying scale. Other studies estimated higher heritability on the observed scale for disease traits: 0.12 ± 0.05 for bacterial infections in Australian rabbits (Eady et al., 2007), 0.17 ± 0.09 to 0.30 ± 0.06 for non-specific mortality, respiratory diseases, and epizootic rabbit enteropathy in Spanish paternal rabbit lines (Ragab et al., 2015). We found no genetic correlations between respiratory and digestive diseases. This result confirms previous results obtained in paternal lines (Gunia et al., 2015). Nonetheless, contrary to the previous study in which only the main disease syndrome was recorded, two disease syndromes could be recorded for each animal in the present dataset. This reduced the bias caused by the limited recording of multiple disease symptoms for the same animal. However, if the two main syndromes were from the same kind of disease (either digestive or respiratory), a third syndrome from another kind of disease would not be registered. The absence of genetic correlation could depend partially from this recording method. The composite trait Resist was genetically correlated with Resp and Dig, and the correlations were similar in the Ch and S environments. Resist could therefore be a good indicator for improving general disease resistance and reducing the sensitivity of rabbits to digestive and respiratory diseases.

#### Genotype by Environment Interactions for Disease Resistance Traits

Genotype by Environment (G×E) interactions were demonstrated for Resist and Dig, with genetic correlations significantly below unity between S and Ch. We also observed a scaling effect, with higher genetic variances in Ch than S. They were compensated by lower variances due to common litter environment, leading to very similar total variance between S and Ch. The lower between-environment genetic correlation observed for Dig (0.48 ± 0.16) compared with Resp (0.84 ± 0.12) may be explained by the higher variability of digestive diseases compared with respiratory diseases. Digestive syndromes can be caused by various pathogens and give rise to various diseases (epizootic rabbit enteropathy, coccidiosis, enterotoxaemia, colibacillosis) that differ among challenging environment farms, whereas respiratory syndromes are mainly caused by Pasteurella multocida in rabbits. In pigs, G×E interactions have been described for production traits between nucleus and testing farms (Merks, 1989), and even between farms of good health status (Hermesch et al., 2015). Another study in rabbits reported G×E interactions for nonspecific mortality, respiratory diseases, and epizootic rabbit enteropathy between animals fed ad libitum and a restricted diet (Ragab et al., 2015) with genetic correlations for the disease traits between the two feeding systems ranging from 0.26 ± 0.09 to 0.68 ± 0.07. In poultry, genetic correlations ranging from 0.78 to 0.82 were observed for footpad dermatitis of broilers reared in two contrasting environments. G×E interactions for disease resistance have also been reported for different times of infection occurring naturally in farms, for example resistance and resilience from low to high worm challenges in sheep (Riley and Van Wyk, 2009). G×E interactions have been observed for reproduction traits during high and low challenge loads (due to natural disease agent and other stressors) in pig farms (Herrero-Medrano et al., 2015), or before and after an outbreak of Porcine Reproductive and Respiratory Syndrome (Lewis et al., 2008).

Various ways to account for G×E interactions in selection have been proposed depending on the magnitude of the genetic correlation of traits evaluated between environments. Robertson (1959) proposed that genetic correlations below the threshold of 0.80 could be considered as having biological importance, with significant reranking of animals occurring across environments. Alternative breeding strategies can therefore be considered. Simulations in dairy cattle have shown that developing specific breeding programs for each environment is interesting when the genetic correlation falls below 0.61 (Mulder et al., 2006). The genetic correlation for Resist\_Ch and Resist\_S is 0.70 ± 0.13, which means that the application of a common breeding program in both environments to improve Resist could be an appropriate strategy. However, rabbit breeding schemes differ from those for dairy cattle: they are pyramidal (as in poultry and pigs) without progeny testing. Therefore, further research needs to be undertaken to compare the advantages of running common or separate breeding schemes for pyramidal selection.

#### Disease Resistance and Production Traits

Estimates of variance components for production traits were generally consistent with the literature (Garcia and Baselga, 2002; Mocé and Santacreu, 2010; Loussouarn et al., 2012; David et al., 2015). The correlations between Resist and the production traits were mostly not significantly different from zero or favorable. To our knowledge, no genetic correlation between disease traits and traits selected in maternal rabbit lines have been reported previously. A study in a paternal rabbit line reported that genetic correlations between disease resistance traits and body weight at 63 or 70 days and carcass yield were not significantly different from zero or favorable (Gunia et al., 2015). However, in their review, Stear et al. (2001) observed that genetic correlations between production and disease resistance traits in livestock can be either favorable or unfavorable. The genetic correlations between production traits and Resist\_S or Resist\_Ch were very similar, despite different magnitudes of genetic variance for Resist in S and Ch.

### Selection Strategies to Improve Disease Resistance Across Environments

Our aim is to reduce the prevalence of disease through selection for host resistance. All the breeding schemes tests showed that we could expect a reduction of disease incidence of 4 to 6% per generation (for the same level of exposure of rabbits to pathogens). If we assume linear genetic progress for the disease resistance traits, we could expect a reduction of prevalence of 41– 31% for Resist\_Ch and 26–21% for Resist\_Ch over 5 generations. However, as stated by (Mackenzie and Bishop, 1999), predicting the consequences and benefits of selection is a difficult step, because altering the genetics of individual animals affects the epidemiology of the disease at the population level. Indeed, if we select rabbits for resistance to disease, pathogen exposure is likely to change. As the number of susceptible animals in the population decreases, pathogen transmission among rabbits will also decrease. Bishop and Stear (1997) modeled the response to selection for resistance to parasites in sheep and obtained better responses to selection than predicted by quantitative genetic theory due to changes in the epidemiology of the disease.

The genetic progress on disease resistance traits was always favorable, even when disease resistance traits where not included in the breeding objective. This result is due to favorable correlations with the other traits. However, as some genetic correlation between production traits and Resist\_S or Resist\_Ch were not significantly different from zero, the "true" correlated response on disease resistance may be null if we select for HProduction. Except for the breeding scheme with HProduction, the genetic progress on Resist\_S was very stable, probably because the amount of information available was always high. For this trait, records on the individual performances of the selection candidates as well as for a large number of sibs were always available for all scenarios including a disease resistance trait in the breeding objective. The genetic correlation between Resist\_S and Resist\_Ch therefore enabled good genetic progress for Resist\_S even when this trait was not directly included in the breeding objective. This relatively high correlation between Resist\_S and Resist\_Ch and the higher heritability of Resist\_Ch could explain why the scenario "Resist\_Ch" gave the highest genetic gain for both Resist\_Ch and Resist\_S. Recording Resist\_Ch always resulted in higher genetic progress for this trait, as could be expected. This finding is in accordance with previous results reported in the literature. Testing half-sibs under commercial conditions is considered as a good option to maintain genetic gain in the presence of G×E (Mulder and Bijma, 2005). Improving disease resistance in both environments is important, due to the high variability of environments on commercial farms (climate, pathogen loads). Some commercial farms apply high biosecurity measures and their environment is similar to the S environment, while on others the conditions are closer to our Ch environments. The breeding objectives presented here were based on the desired gain methodology, where the weight applied to the trait is based on the target genetic progress for the breeding company. The weight given to Resist in our simulations was high. Other scenarios with more balanced weights among traits may have led to less progress for Resist. Another methodology to define the breeding objective would be to consider economic weights and weighting the genetic gain in each environment by the relative importance of that environment (Mulder et al., 2006). These weights could reflect the size of the doe population in each environment. Economic weights have previously been derived for rabbit meat production (Cartuche et al., 2014). Litter size was the trait with the highest economic value, and was considered 5 times more important (expressed per standard deviation) than fattening survival. The discrepancy between the two methodologies can also reflect the strategic choices of different breeding companies.

#### Gunia et al. Selecting Non-specific Disease Resistance

#### CONCLUSION

Selection on non-specific disease resistance or tolerance using simple observations seems to be feasible. The trait is heritable, and the genetic correlations with the other traits under selection are not significantly different from zero or favorable. G×E interactions exist for this trait between selection and challenging environments. Therefore, recording this trait in both environments results in higher genetic progress. Longterm prediction of the genetic gain is difficult, due to the probable changes in disease epidemiology caused by selection. Quantitative genetics theory predicts a reduction of disease incidence by 4–6% per generation. The true genetic gain is likely to be greater than that predicted on the present study. Such selection could have a major impact on the reduction of antibiotic use and on the improvement of animal welfare. The biological mechanisms underlying non-specific disease resistance

#### REFERENCES


are not fully understood yet. Beside classical immune parameters, the gut microbiota has recently emerged has a key regulator of immunity. Further studies using high throughput and deep phenotyping approaches of extreme animals with the highest estimated breeding values for disease resistance and disease sensitivity are needed to unravel the mechanisms in play.

#### AUTHOR CONTRIBUTIONS

MG performed the analyses, critically interpreted the data, and prepared the manuscript. HerG made substantial contributions to the analyses and interpretation of data. ID made substantial contributions to the analysis, critically interpreted the data, and reviewed the manuscript. JH and MM designed the experimental protocol and critically interpreted the data. HélG made substantial contributions to the interpretation of data and edition of the manuscript.


severities of worm challenge in a Merino flock in South Africa. Vet. Parasitol. 164, 44–52. doi: 10.1016/j.vetpar.2009.04.014


**Conflict of Interest Statement:** This study was conducted under a research contract between INRA and Hypharm supervised by HerG. JH and MM are employed by Hypharm. However, the possible conflict of interest did not interfere with the outcome of this paper.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Gunia, David, Hurtaud, Maupin, Gilbert and Garreau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The Genomic Architecture of Fowl Typhoid Resistance in Commercial Layers

Androniki Psifidi1,2 \*, Kay M. Russell<sup>1</sup> , Oswald Matika<sup>1</sup> , Enrique Sánchez-Molano<sup>1</sup> , Paul Wigley<sup>3</sup> , Janet E. Fulton<sup>4</sup> , Mark P. Stevens<sup>1</sup> and Mark S. Fife<sup>5</sup>

<sup>1</sup> The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, United Kingdom, <sup>2</sup> Royal Veterinary College, University of London, Hatfield, United Kingdom, <sup>3</sup> Department of Infection Biology, Institute for Infection and Global Health, University of Liverpool, Neston, United Kingdom, <sup>4</sup> Hy-Line International, Dallas Center, IA, United States, <sup>5</sup> The Pirbright Institute, Surrey, United Kingdom

#### Edited by:

Martien Groenen, Wageningen University and Research, Netherlands

#### Reviewed by:

Xiangdong Ding, China Agricultural University, China Shaojun Liu, Hunan Normal University, China

\*Correspondence: Androniki Psifidi androniki.psifidi@roslin.ed.ac.uk; apsifidi@rvc.ac.uk

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 30 May 2018 Accepted: 15 October 2018 Published: 19 November 2018

#### Citation:

Psifidi A, Russell KM, Matika O, Sánchez-Molano E, Wigley P, Fulton JE, Stevens MP and Fife MS (2018) The Genomic Architecture of Fowl Typhoid Resistance in Commercial Layers. Front. Genet. 9:519. doi: 10.3389/fgene.2018.00519 Salmonella enterica serovar Gallinarum causes devastating outbreaks of fowl typhoid across the globe, especially in developing countries. With the use of antimicrobial agents being reduced due to legislation and the absence of licensed vaccines in some parts of the world, an attractive complementary control strategy is to breed chickens for increased resistance to Salmonella. The potential for genetic control of salmonellosis has been demonstrated by experimental challenge of inbred populations. Quantitative trait loci (QTL) associated with resistance have been identified in many genomic regions. A major QTL associated with systemic salmonellosis has been identified in a region termed SAL1. In the present study, two outbreaks of fowl typhoid in 2007 and 2012 in the United Kingdom were used to investigate the genetic architecture of Salmonella resistance in commercial laying hens. In the first outbreak 100 resistant and 150 susceptible layers were genotyped using 11 single nucleotide polymorphism (SNP) and 3 microsatellite markers located in the previously identified SAL1 region on chromosome 5. From the second outbreak 100 resistant and 200 susceptible layers, belonging to a different line, were genotyped with a high-density (600 K) genome-wide SNP array. Substantial heritability estimates were obtained in both populations (h <sup>2</sup> = 0.22 and 0.26, for the layers in the first and second outbreak, respectively). Significant associations with three markers on chromosome 5 located close to AKT1 and SIVA1 genes, coding for RAC-alpha serine/threonine protein kinase, and the CD27-binding protein SIVA1, respectively, were identified in the first outbreak. From analysis of the second outbreak, eight genome-wide significant associations with Salmonella resistance were identified on chromosomes 1, 6, 7, 11, 23, 24, 26, 28 and several others with suggestive genomewide significance were found. Pathway and network analysis revealed the presence of many innate immune pathways related to Salmonella resistance. Although, significant associations with SNPs located in the SAL1 locus were not identified by the genomewide scan for layers from the second outbreak, pathway analysis revealed P13K/AKT signaling as the most significant pathway. In summary, resistance to fowl typhoid is a heritable polygenic trait that could possibly be enhanced through selective breeding.

Keywords: fowl typhoid, chicken, layers, disease outbreak, GWAS, pathway

## INTRODUCTION

fgene-09-00519 November 15, 2018 Time: 17:39 # 2

Salmonella enterica serovar Gallinarum causes a systemic bacterial disease mainly in adult poultry known as fowl typhoid. Outbreaks of this disease can have huge financial consequences with infected flocks having reduced egg production and a high percentage of mortality (Shivaprasad, 2000; Barrow and Freitas Neto, 2011). Regulations across the European Union compel poultry producers to control Salmonella in their layer and broiler breeder flocks. For example, in the United Kingdom, the Poultry Health Scheme routinely tests farms for the presence of S. Gallinarum resulting in rare occurrence of the disease after a prolonged control strategy (Poultry Health Scheme Handbook, 2013; Wigley, 2017). Despite such control measures, some outbreaks have been reported in recent years for both caged layers and backyard flocks in the United Kingdom indicating that outbreaks do still occur with devastating effects (Cobb et al., 2005; Parmar and Davies, 2007). More worrying, fowl typhoid has re-emerged in recent years in developing countries that have also established sanitary measures and official programs to prevent and control the disease. However, the disease remains endemic with cyclic or seasonal outbreaks related mainly to disease management (Revolledo, 2018). Therefore, a pressing need exists for complementary strategies to control the disease (Barbour et al., 2015; Guo et al., 2016; Celis-Estupinan et al., 2017; Pal et al., 2017; Weerasooriya et al., 2017).

Genetic selection for birds resistant to S. Gallinarum has been seen as an attractive solution for the control of fowl typhoid since the 1930's (Lambert and Knox, 1932). Inbred chicken lines have been described that exhibit heritable differences in resistance to systemic salmonellosis, including following oral S. Gallinarum inoculation or intravenous administration of S. Typhimurium (Bumstead and Barrow, 1993; Mariani et al., 2001). These lines have been extensively studied over the past 35 years, and crosses between these lines have been used to identify quantitative trait loci (QTL) for Salmonella resistance. A region on chromosome 5, termed SAL1, has been identified in multiple independent studies as having a protective role against systemic salmonellosis in the chicken (Mariani et al., 2001; Kaiser and Lamont, 2002; Tilquin et al., 2005; Calenge et al., 2010; Redmond et al., 2011). We refined the SAL1 major QTL by mapping resistance in a 6th generation backcross with inbred lines 6<sup>1</sup> (resistant) and 15I (susceptible) using a high-density SNP panel (Fife et al., 2009). The refined SAL1 region contains 14 genes with some noticeable candidates that have previously been linked with Salmonella resistance in other species, such as the RAC-alpha serine/ threonine protein kinase homolog, AKT (Fife et al., 2009). It is noteworthy that distinct QTL have been associated with enteric carriage of S. Typhimurium (Fife et al., 2011).

The present study builds on and extends our previous studies in inbred lines, aiming to dissect the genomic architecture of fowl typhoid resistance using two different United Kingdom commercial layer populations which suffered from natural outbreaks of fowl typhoid. We conducted variance component analyses to estimate genetic parameters and genomic association studies to identify genomic regions controlling fowl typhoid resistance. We also performed gene enrichment and pathway analyses to identify candidate genes within the relevant genomic regions.

## MATERIALS AND METHODS

## Ethics Statement

All animal experiments were conducted in accordance with the revised Animals (Scientific Procedures) Act 1986 (project license PPL40/3652) with the approval of the local Ethical Review Body.

## Study Population

Two different commercial laying hen populations suffering from two separate S. Gallinarum outbreaks of fowl typhoid, in 2007 and 2012 in the United Kingdom, were used in this study. From the first outbreak, blood and liver samples from 250 layers (150 susceptible and 100 resistant) were collected.

The second outbreak affected a layer farm with 375,000 birds. While most of the infected birds succumbed to infection, about 0.1% of the birds showed some level of resistance, with only mild clinical signs. Ultimately all remaining birds were culled on humane grounds, to prevent further spread of infection. From this outbreak, blood, spleen, and liver samples were collected from 300 layers (200 susceptible and 100 resistant). Three liver samples were collected from each bird, one in tissue storage reagent RNAlater <sup>R</sup> , one in formalin for histological analysis, and one in phosphate-buffered saline (PBS) for enumeration of viable bacteria.

The collection of samples was performed by qualified veterinarians: samples were collected from birds raised in the same pens; live birds were culled and classified based on the observed pathology (lesions in liver, spleen, or ovary) as resistant or susceptible. Susceptible birds had extensive pathology implying potential death from lesions in the next 24 h. Resistant birds had no overt gross lesions on post mortem examination, with limited clinical signs.

For the first outbreak prevalence data was unavailable. For the second outbreak, the rate of infection varied between the six poultry houses on the affected premises. Levels of mortality consistent with clinical signs of fowl typhoid were recorded for the second outbreak with peak levels at approximately 3000 birds per day across the farm. Toward the end of the outbreak approximately 33% of birds had succumbed to infection. Birds for this study were sampled from the poultry house with the highest reported prevalence.

## Phenotyping

For the first outbreak the trait was binary [0/1, case (susceptible) control (resistant)]. For the second outbreak S. Gallinarum load in liver was determined in colony-forming units (CFU)/gram as described previously (Mariani et al., 2001). Briefly, liver samples in PBS were weighed and homogenized in an equal v/w of PBS. The homogenized liver tissue was serially diluted and plated on Modified Brilliant Green Agar (Oxoid, United Kingdom), incubated overnight, and the numbers of bacterial colonies were counted. The number of CFU/g was log transformed in order to normalize the distribution. The trait for the second outbreak was analyzed both using continuous as well as binary phenotypes.

## Histology and Assessment of Pathogenicity

fgene-09-00519 November 15, 2018 Time: 17:39 # 3

Histological analyses were performed on liver and spleen samples from birds from the second outbreak. Samples of liver and spleen were fixed in formalin, paraffin-wax embedded then cut and stained with haemotoxylin and eosin by the Department of Veterinary Pathology, University of Liverpool. Tissues were observed and analyzed blind as described previously (Parsons et al., 2013).

Assessment of pathogenicity of the strain isolated from the second outbreak in an experimental infection model was made in comparison with two well characterized S. Gallinarum isolates SG9 and 287/91 (Jones et al., 2001), as described previously (Langridge et al., 2015). Briefly, groups of five 3-week-old Salmonella-free commercial brown egg layer chickens (Lohmann Brown) were infected orally with 10<sup>8</sup> CFU of each of the S. Gallinarum isolates or remained as an uninfected control. At 6 days post challenge all birds were killed and at post mortem examination the spleen, liver, and caecal contents were removed for enumeration of viable Salmonella on selective Modified Brilliant Green Agar (Oxoid, United Kingdom) as detailed previously (Langridge et al., 2015).

#### Genotyping

All the birds from the first outbreak were genotyped using 11 custom-made SNP and 3 microsatellites markers located in the previously identified SAL1 region on chromosome 5 (Fife et al., 2009). A full list of these markers is displayed in **Supplementary Table S1**. All the birds from the second outbreak were genotyped with the 600 K high density genome-wide SNP array (Affymetrix <sup>R</sup> Axiom <sup>R</sup> HD) (Kranis et al., 2013).

#### Heritability Analyses

Genetic parameters were estimated for S. Gallinarum resistance for the first and the second outbreak using a mixed linear univariate model that included the population principal components (for the second outbreak only) as a covariate effect, and the random effect of the individual bird. Genetic relationships between birds were calculated based on SNP genotypes using the genome-wide efficient mixed model association (GEMMA) algorithm (Zhou and Stephens, 2014) and included in the analyses. For the second outbreak the continuous phenotypes were used to estimate the variance components. The heritability of each trait was calculated as the ratio of the additive genetic to the total phenotypic variance. All above analyses were performed separately for each outbreak using the ASReml 4.0 software (Gilmour et al., 2009).

#### Genomic Association Analyses

#### Single-Marker Genomic Association Analyses

For the first outbreak a single marker association analysis where the SNP genotype was fitted as a fixed effect and the genomic relatedness matrix was fitted as a random polygenic effect was performed using ASReml 4.0 software (Gilmour et al., 2009).

Data from the second outbreak were analyzed using two genome-wide association methodologies. Briefly, either a single SNP or a group of SNPs in sets of windows/ regions-using a regional heritability mapping approach (RHM)- were fitted as fixed effects.

The SNP genotype data were subjected to quality control measures using PLINK v1.09 (Purcell et al., 2007): minor allele frequency >0.05, call rate >95% and Hardy–Weinberg equilibrium (P > 10−<sup>6</sup> ). After quality control, 297,560 SNP markers remained for further analysis. Positions of SNP markers were obtained using the Gal-gal5 assembly in Ensembl Genome Browser<sup>1</sup> .

Population stratification was investigated using a genomic relatedness matrix generated from all individuals. This genomic relatedness matrix was converted to a distance matrix that was used to carry out classical multidimensional scaling analysis (MSA) using the GenABEL package of R (Aulchenko et al., 2007), to obtain its principal components.

The GEMMA algorithm (Zhou and Stephens, 2014) was used to perform GWAS analyses using a standard univariate linear mixed model in which the first four principal components were fitted as covariate effects to adjust for population structure and the genomic relatedness matrix among individuals was fitted as a polygenic effect. After Bonferroni correction for multiple testing, significance thresholds were P ≤ 1.68 × 10−<sup>7</sup> and P ≤ 3.36 × 10−<sup>6</sup> for genome-wide significant (P ≤ 0.05) and suggestive (namely one false positive per genome scan) levels, respectively, corresponding to −log10(P) of 6.77 and 5.47. The Chi-square (χ 2 ) test was implemented to validate the GWAS results. A P-value for each comparison (expected vs. observed values) was estimated based on the χ 2 statistics value for two degrees of freedom. The significance threshold was set at P ≤ 0.05. The extent of linkage disequilibrium (LD) between significant SNPs located on the same chromosome regions was calculated using the r-square statistic of PLINK v1.09 (Purcell et al., 2007).

#### Regional Heritability Mapping

The RHM approach was used to analyse data from the second outbreak fitting genomic regions of 20 SNPs in sliding "windows" along each chromosome. RHM analyses were performed using the DISSECT software (Canela-Xandri et al., 2015) fitting the same fixed effects as the ones used in the single SNP GWAS described above. The significance of genomic regions was assessed with the likelihood ratio test statistic, which was used to compare the RHM model where both the whole genome and a genomic region were fitted as random effects against the base model that excluded the latter effect. A total of 14,878 regions were tested across the genome. After the adjustment, using Bonferroni correction, for multiple testing significance thresholds were P ≤ 3.37 × 10−<sup>6</sup> and P ≤ 6.72 × 10−<sup>5</sup> for genome-wide (P ≤ 0.05) and suggestive (namely one false

<sup>1</sup>www.ensembl.org

positive per genome scan) levels, respectively, corresponding to −log10(P) of 5.47 and 4.17.

#### SNP and Candidate Region Annotation

All significant SNPs identified in the GWAS for the second S. Gallinarum outbreak were mapped to the reference genome and annotated by using the variant effect predictor<sup>2</sup> tool within the Ensembl database and the Gal-gal5 assembly. Moreover, the genes that were located 100 kb upstream and downstream of the significant SNPs were also annotated using the BioMart data mining tool<sup>3</sup> and the Gal-gal5 assembly. We chose these 200 kb windows based on the average LD in commercial populations (less than 1 cM on average; Andreescu et al., 2007) and the fact that the chicken genome contains 250 kb per cM on average (International Chicken Genome Sequencing Consortium, 2004). This allowed us to catalog all the genes that were located in the vicinity of the identified significant SNPs and to create gene lists that contained the genes in the vicinity of all the significant SNPs identified for fowl typhoid resistance.

## Pathway, Network and Functional Enrichment Analyses

Identification of potential canonical pathways and networks underlying the candidate genomic regions associated with resistance to the second S. Gallinarum outbreak was performed using the Ingenuity Pathway Analysis (IPA) program<sup>4</sup> . IPA constructs multiple possible upstream regulators, pathways, and networks that serve as hypotheses for the biological mechanism underlying the phenotypes based on a large-scale causal network derived from the Ingenuity Knowledge Base. Then, IPA infers the most suitable pathways and networks based on their statistical significance, after correcting for a baseline threshold (Krämer et al., 2014). The IPA score in the constructed networks can be used to rank these networks based on the P-values obtained using Fisher's exact test [IPA score or P-score = −log10(P-value)].

The gene list for S. Gallinarum resistance was also analyzed using the Database for Annotation, Visualization and Integrated Discovery (DAVID; Dennis et al., 2003). In order to understand the biological meaning behind these genes, gene ontology (GO) was determined and functional annotation clustering analysis was performed. The Gallus gallus background information is available in DAVID and was used for the analysis. The enrichment score (ES) of the DAVID package is a modified Fisher exact P-value calculated by the software, with higher ES reflecting more enriched clusters. An ES greater than 1 means that the functional category is overrepresented.

#### RESULTS

#### Descriptive Statistics of Phenotypes

A mean three-log difference of liver S. Gallinarum viable counts between the resistant (average: 4.4 log10CFU/gr, standard

<sup>2</sup>http://www.ensembl.org/Tools/VEP

<sup>3</sup>http://www.ensembl.org/biomart/martview/

<sup>4</sup>www.ingenuity.com

deviation: 1.66) and the susceptible (average: 7.4 log10CFU/gr, standard deviation: 0.77) birds from the second outbreak was detected, consistent with the pathology results. The maximum of liver count measured was 8.45 log<sup>10</sup> CFU/gr, while in 34 samples no viable S. Gallinarum was detected (minimum).

#### Histology and Assessment of Pathogenicity

As many samples were autolysed or degraded detailed scoring was not possible. However, analysis of tissues from six resistant and nine susceptible birds where the sample was not compromised, showed patterns of pathology similar with the ones previously described following experimental infection of resistant and susceptible inbred lines with S. Gallinarum (Wigley et al., 2002). Resistant birds showed signs of inflammation, largely restricted to specific foci in the liver (**Figure 1A**) and general inflammation in the spleen. In contrast susceptible birds showed greater levels of inflammation and large areas of necrotic damage in the liver (**Figure 1B**), with a high degree of inflammatory cell influx into the spleen with thickening of the splenic capsule and some areas of necrosis. These findings are consistent with observations in inbred lines exhibiting differential resistance following experimental infection (Mariani et al., 2001).

In experimental infection studies, a clonal isolate from the second outbreak was recovered in equivalent or greater numbers from the spleen and liver of orally challenged birds than 287/91 or SG9 (**Supplementary Figure S1**). This fulfills Koch's postulates and the outbreak strain may be considered typical of other S. Gallinarum strains in the pathology it elicits. None of the isolates were detected in the caecal contents at the time of post mortem examination.

#### Single-Marker Genomic Association Studies

Similar moderate heritability estimates for S. Gallinarum resistance were derived for both layer populations in the first (h <sup>2</sup> = 0.22 ± 0.01) and second (h <sup>2</sup> = 0.26 ± 0.14) outbreaks.

FIGURE 1 | Representative haematoxylin and eosin stained sections of liver tissue from resistant (A) and susceptible (B) chickens from the second outbreak (magnification × 400). The liver of susceptible birds show extensive necrotic tissue damage and massive and widespread influx of inflammatory cells whereas resistant birds show smaller defined loci of inflammation.

Seven markers located in the SAL1 locus on chromosome 5 were found to have a significant (P < 0.05) association with S. Gallinarum resistance in the layer population affected by the first outbreak. Details of the significant markers identified are presented in **Table 1**.

Multidimensional scaling analysis revealed four substructure principal components in the layer population affected by the second outbreak, which were subsequently included in the GWAS model to correct results for population stratification.

GWAS analysis identified six SNP markers genome-wide significantly associated with the log-transformed liver load of S. Gallinarum in layers from the second outbreak on chromosomes 1, 11, 23, 24, and 26 (P-values 7.36 × 10−<sup>10</sup> to 1.63 × 10−<sup>7</sup> ) (**Table 2**). Additionally, 14 SNPs crossing the suggestive genomewide significant threshold were identified on chromosomes 1, 2, 4, 6, 13, 19, 24, and 28 (**Table 2**). The Manhattan plot and the Q-Q plot for the GWAS results are displayed in **Figures 2A,B**.

The same significant associations on chromosomes 1, 23, 26, and 28 were identified by the GWAS analysis when the data was re-analyzed as a binary (case-control) trait (**Table 2**), although the ranking of the SNPs based on the P-values were different. With the case-control analysis the association on chromosome 28 attained genome-wide significance (P-values 2.41 × 10−12). This approach identified also two new genome-wide significant associations on chromosomes 6 and 7 (P-values 8.81 × 10−<sup>9</sup> to 1.08 × 10−<sup>8</sup> ) and new suggestive associations with markers on chromosomes 1, 3, 10, 11, and 23 (**Table 2**). All the significant associations identified by the GWAS were also found to be significant (P < 0.05) in the chi-square analysis. The Manhattan plot and the Q-Q plot for the GWAS results from the casecontrol analysis are displayed in **Figures 3A,B**. Significant SNPs that were located on the same chromosome were not in LD with the exception of the markers located on chromosome 13 (r <sup>2</sup> > 0.90).

#### Regional Heritability Mapping

The RHM mapping confirmed the significant associations on chromosomes 1, 11, 23, 24, and 26 previously identified by the GWAS (**Supplementary Table S2**). Moreover, RHM detected two more suggestive significant associations on chromosomes 2 and 11. Details of the significant SNP windows are presented in

TABLE 1 | List of SNPs associated with fowl typhoid resistance in the layer population from the first outbreak.


SNP markers in bold are spanning the AKT1 gene.

**Supplementary Table S2**. The Manhattan plot and the Q-Q plot for the RHM results analysis are displayed in **Supplementary Table S2**.

#### SNP and Candidate Region Annotation

All of the significant markers identified for the first outbreak were located in intronic, intergenic, upstream and downstream gene regions with the exception of one SNP (SNP7) which corresponds to a missense variant within the Creatine Kinase B (CKB) gene. Four of the significant SNP markers spanned the AKT1 gene. All these SNPs are intronic variants for the AKT1 and also upstream and downstream variants for one microRNA (gga-mir-1771). The candidate region for fowl typhoid on chromosome 5 contained 16 protein coding genes and 2 microRNAs (**Supplementary Table S3**).

Most of the significant SNPs identified by the GWAS analyses for the second outbreak were located in intronic (34%), intergenic (24%) and upstream and downstream gene (14%) regions. However, four of the SNPs were localized in exonic regions. Specifically, Affx-51116866 corresponds to a missense variant within the Cell Adhesion Molecule 1 (CADM1) gene; Affx-51148005 corresponds to a missense variant within the TATA-Box Binding Protein Associated Factor 8 (TAF8) gene; Affx-51686897 corresponds to a missense variant within the AT-Rich Interaction Domain 5B (ARID5B) gene; Affx-51177949 corresponds to a synonymous variant within the Growth Differentiation Factor 3 (GDF3) gene. The above mentioned missense variants had a predicted moderate impact.

Most of the candidate regions for fowl typhoid resistance identified from the second outbreak contained multiple genes. In total 116 protein-coding genes and 4 microRNAs identified across the QTL regions for the second outbreak (**Supplementary Table S3**).

### Pathway, Network and Functional Enrichment Analyses

We reasoned that the corresponding QTL regions may contain genes contributing to a common pathway associated with S. Gallinarum resistance. We therefore identified the sets of annotated genes lying within the QTL intervals identified for the second outbreak and sought evidence of gene set enrichment. These genes were enriched for pathways involved in immune responses, both innate and adaptive, and cell-cycle regulation (**Figure 4**). The most enriched pathway was related to the P13K/AKT signaling. Moreover, three networks of molecular interactions related to cell death and survival, and cell cycle, humoral immune response, hematological system development and function, and hematopoiesis were constructed using the list of genes in the candidate regions (**Figure 5**).

Functional annotation clustering analysis revealed the presence of enriched gene clusters related to protein kinase binding (E.S = 2.35, genes in the cluster: PRKRIP1, CCND3, YWHAG, HDAC1), positive regulation of immune system processes (E.S = 1.7, genes in the cluster: CD247, SH2B2, CADM1, HPX, LCK), hematopoiesis and immune system

TABLE 2 | List of SNPs associated with fowl typhoid resistance in the layers from the second outbreak.


P-value from genomic association study (genome-wide significant in bold, suggestive significance otherwise).

development (E.S = 1.1, genes in the cluster: CEBPA, CEBPG, LCK, RTKN2).

#### DISCUSSION

Our study set out to investigate the genetic basis of fowl typhoid resistance in commercial layers. Using samples from two natural disease outbreaks, we detected heritable genetic variation and identified genomic regions associated with resistance to the disease in two different layer populations. Putative candidate genes, canonical pathways and networks involved in the underlying molecular mechanisms of fowl typhoid resistance were also identified.

In terms of phenotype, there was on average a 3 Log<sup>10</sup> difference in the recovery of viable S. Gallinarum between resistant and susceptible birds from the second outbreak and differences in pathology that are consistent with those observed following experimental infection of inbred lines that exhibit heritable differences in resistance following oral S. Gallinarum or intravenous S. Typhimurium inoculation (Bumstead and Barrow, 1993; Mariani et al., 2001). Although much of the QTL-based mapping of the SAL1 locus in inbred lines used intravenous infection of day old chicks with S. Typhimurium, the phenotype of resistance to experimental fowl typhoid is strongly expressed in older birds (Bumstead and Barrow, 1993) with quantitative differences of 3–4 Log<sup>10</sup> CFU per gram of liver tissue between resistant and susceptible lines found 8 days after oral challenge with S. Gallinarum in 3-week-old birds (Wigley et al., 2002). Therefore, the pathology, phenotyping and histological results of the present study conducted in commercial layers are consistent with previous findings in inbred lines for fowl typhoid infection.

function, hematopoiesis), (B related to cell to cell signaling and interaction, cellular compromise, cellular development), and (C related to cell cycle, cell death and survival, cellular development) illustrate molecular interactions between products of candidate genes selected from the QTL regions for fowl typhoid resistance in the layer population affected by the second outbreak. Arrows with solid lines represent direct interactions and arrows with broken lines represent indirect interactions. Genes with white labels are those added to the IPA analysis because of their interaction with the target gene products.

In addition, the present study provided further evidence for the role of SAL1 locus in Salmonella resistance. AKT1 is a promising candidate gene of this QTL region as the protein is known to be activated by Salmonella and to promote intracellular net replication of the bacteria in mammalian cells (Steele-Mortimer et al., 2000; Kuijl et al., 2007). In the first outbreak the markers with the most significant association with the fowl typhoid spanned the AKT1 gene. Although, significant associations with SNPs located in the SAL1 locus were not identified by the genome-wide scan for layers from the second outbreak, pathway analysis revealed that the P13K/AKT signaling as the most significant pathway, implying that AKT pathway might play a role in Salmonella resistance. It is possible that other genes that are part of the P13/AKT pathway such as JAK3, KRAS, GYS2, PPP2CA, YWHAG might contribute to fowl typhoid resistance in the layers of the second outbreak since they belong to a different selection line and SNP markers proximal to these genes were identified in the GWAS analysis. Therefore, although the underlying mechanism might be similar, the causative mutation(s) might be different in the two populations. In addition, the phase of LD between the SNP markers and the causative mutation(s) might be different in the two different layer populations. AKT is a serine/threonine kinase that modulates multiple processes, in particular apoptosis, cell proliferation, and development (Hers et al., 2011). Depending on the cell type and stage of infection, apoptosis may play both positive and negative roles in control of Salmonella infection (Fink and Cookson, 2007). Nevertheless, the involvement of the other striking candidate gene, the CD27-binding protein SIVA1, in fowl typhoid resistance could not be excluded since the two candidate genes are in close proximity and significant markers were detected on either sides of these genes. SIVA1 is a pro-apoptotic factor that induces cell death via a caspasedependent pathway in human and murine cells (Prasad et al., 1997; Py et al., 2004). It has been also proposed that differences in the expression or function of SIVA1 in the progeny of advanced inter-cross chicken lines may explain differences in the ability of heterophils from such birds to release heterophil extracellular traps via an apoptosis-like pathway (Redmond et al., 2011).

This is the first study, to our knowledge, that aimed to dissect the genetic architecture of fowl typhoid resistance using data of natural disease outbreaks. However, there are many previous genetic studies of systemic salmonellosis, Salmonella enteric carriage, carrier-state and antibody responses based on challenge experiments of S. Enteritidis and S. Typhimurium in crosses of inbred, and crosses of inbred with commercial chicken lines. Interestingly, many of the previously identified QTLs are overlapping or are in close proximity with the ones identified in the present study. The two QTLs we identified on chromosome 1 at position 67.5 and 91.5 Mb are closely located; the former with one identified in inbred lines for cloacal bacterial burden after oral challenge with S. Enteritidis (Tilquin et al., 2005) and the latter with one identified in broiler crosses for spleen bacterial burden after intra-oesophageal challenge (Kaiser and Lamont, 2002) and for vaccine response after subcutaneous challenge with S. Enteritidis (Kaiser et al., 2002). The QTLs on chromosome 1 (194.5 Mb) and chromosome 11 (20 Mb) overlap with QTLs found in inbred line crosses for carrier Salmonella state after oral challenge with S. Enteritidis (Calenge et al., 2011). Likewise, the QTL regions on chromosome 2

(122 Mb) and 4 overlap with previously identified QTLs for spleen bacterial burden after challenge with S. Enteritidis intraoesophageal (Malek et al., 2004). The QTL on chromosome 3 overlaps with a QTL identified in advanced intercrosses of inbred lines with broilers for spleen bacterial burden after intra-oesophageal infection with S. Enteritidis (Hasenstein and Lamont, 2007). In the latter study, the gallinacin group of genes were considered good candidate genes for Salmonella resistance. The gallinacin-8 precursor (AvBD8) gene is also in close proximity with the significant marker identified on chromosome 3 in the present study. However, more studies are needed to confirm if this is the actual causative gene for this QTL. The QTLs that we identified on chromosomes 7, 19, 23, 24, and 26 are co-localized with previously identified QTLs in inbred chicken line crosses for S. Enteritidis caeca colonization after oral inoculation (Thanh-Son et al., 2012). Many immune genes (such as LAT2/NTAL, TRAF3IP3, IRF6) located within these QTL regions have been suggested as good candidate genes for Salmonella resistance.

The present study implemented a much higher density genome-wide genotyping platform compared to all the previous ones and was able to identify some novel QTLs. Moreover, two different approaches, GWAS and RHM, were implemented to further facilitate the QTL discovery. GWAS performs single marker analyses while RHM fits genomic regions of multiple SNPs as a single measure. Therefore, RHM has greater power compared to GWAS to identify loci where several alleles with small effects segregating. In addition, we implemented two different GWAS models, one using binary phenotypes and the other the continuous phenotypes. We used the binary phenotypes to be consistent with the phenotypes used to analyse the first outbreak, and the continuous ones to increase further the power of the study and overcome putative errors derived from misclassifications of cases and controls. The marker on chromosome 28 found to have the most significant association with fowl typhoid resistance, when the trait was analyzed as binary, is surrounded by many putative good candidate genes. Such genes related with immune response are the tyrosineprotein Janus kinase 3 (JAK3), the CREB regulator transcription coactivator 1 (CRTC1) and the cytokine receptor like factor 1 (CRLF1). The IPA analysis identified two canonical pathways related with JAK signaling among the most enriched pathways in this dataset: the JAK1 and JAK3 in the γc cytokine regulation signaling and the JAK-Stat signaling. In addition, the immune related network with the highest IPA score had as one of the central molecules the JAK3 protein. The JAK signaling family of tyrosine kinases are involved in cytokine receptormediated intracellular signal transduction. Specifically, JAK3 mediates essential signaling events in both innate and adaptive immunity and plays a crucial role in hematopoiesis during T-cells development (Yamaoka et al., 2004). Multiple markers on chromosome 13 were found to have a significant association with fowl typhoid resistance. These markers span the follistatinrelated protein 4 precursor (FSTL4) gene which is related with calcium metabolism and transportation. However, in close proximity (<0.5 Mb), immune genes of interest such as the Interleukin 3 precursor (IL-3), Interleukin 5 precursor (IL5), and the Interferon Regulatory Factor 1 (IRF1), are located. The protein encoded by IRF1 gene is a transcriptional regulator and tumor suppressor, serving as an activator of genes involved in both innate and acquired immune responses. The encoded protein activates the transcription of genes involved in the body's response to viruses and bacteria, playing a role in cell proliferation, apoptosis, immune and DNA damage response (Yoshida and Azuma, 1992; Taniguchi et al., 1995). In addition, it is involved in the regulation of interferon (IFN) and IFNinducible genes that have been reported to be involved in host resistance to Salmonella infection (Thanh-Son et al., 2012).

## CONCLUSION

We confirmed that resistance to fowl typhoid is a heritable complex polygenic trait. Co-localisation of many of the QTLs identified for fowl resistance with previous ones identified for systemic and enteric salmonellosis, and antibody responses implying that common underlying mechanisms of resistance to different Salmonella serovars segregating across chicken populations. These findings strengthen the interest of these regions for more refined analyses. According to our results breeding for enhanced fowl typhoid resistance in layers is possible. Although genomic selection is a valid approach to enhance disease resistance in chickens, as has been reported previously (Legarra et al., 2011), identification of the causative genes and mutations could expedite selection through different weighting of the validated selectable markers or precision breeding. However, further studies are needed to identify the causative genes and mutations.

## DATA AVAILABILITY STATEMENT

All data are available at Figshare (doi: 10.6084/m9.figshare. 7205702).

## AUTHOR CONTRIBUTIONS

MF, MS conceived and designed the genetic studies of fowl typhoid and secured funding. AP and OM performed the genetic parameter analysis, collated and edited the genotyping data, and performed the single marker genomic analysis. AP and ES-M performed the regional heritability mapping. AP performed the pathway analysis and wrote the manuscript with input from KR. PW performed the histological analysis and experimental infections. MF, MS, JF, KR, and AP interpreted these results. All other co-authors provided manuscript editing and feedback. All authors read and approved the final manuscript.

## FUNDING

The authors gratefully acknowledge funding from the Biotechnology & Biological Sciences Research Council (BBSRC) and Hy-Line International (BB/J015296/1) and BBSRC strategic investment at The Pirbright Institute (BB/J016837/1 and BB/P013740/1) and The Roslin Institute (BB/J004227/1 and BBS/E/D/20002172).

#### ACKNOWLEDGMENTS

fgene-09-00519 November 15, 2018 Time: 17:39 # 10

We would like to extend our gratitude and thanks for the help of the vets collecting the samples and performing the clinical

#### REFERENCES


analyses, and to the farmers who allowed us to use these samples for research purposes.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00519/full#supplementary-material



regions controlling resistance to Salmonella colonization and carrier-state. BMC Genomics 13:198. doi: 10.1186/1471-2164-13-198


**Conflict of Interest Statement:** JF was employed by Hy-Line International.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Psifidi, Russell, Matika, Sánchez-Molano, Wigley, Fulton, Stevens and Fife. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Sequence Characterization of DSG3 Gene to Know Its Role in High-Altitude Hypoxia Adaptation in the Chinese Cashmere Goat

Chandar Kumar1,2, Shen Song<sup>1</sup> , Lin Jiang<sup>1</sup> , Xiaohong He<sup>1</sup> , Qianjun Zhao<sup>1</sup> , Yabin Pu<sup>1</sup> , Kanwar Kumar Malhi<sup>3</sup> , Asghar Ali Kamboh<sup>3</sup> and Yuehui Ma<sup>1</sup> \*

<sup>1</sup> The Key Laboratory for Farm Animal Genetic Resources and Utilization of Ministry of Agriculture of China, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China, <sup>2</sup> Department of Animal Breeding and Genetics, Faculty of Animal Husbandry and Veterinary Sciences, Sindh Agriculture University, Tando Jam, Pakistan, <sup>3</sup> Department of Veterinary Microbiology, Faculty of Animal Husbandry and Veterinary Science, Sindh Agriculture University, Tando Jam, Pakistan

The Tibetan cashmere goat is one of the main goat breeds used by people living in the plateau. It exhibits the distinct phenotypic characteristics observed in lowland goats, allowing them to adapt to the challenging conditions at high altitudes. It provides an ideal model for understanding the genetic mechanisms underlying highaltitude adaptation and hypoxia-related diseases. Our previous exome sequencing of five Chinese cashmere breeds revealed a candidate gene, DSG3 (Desmoglein 3), responsible for the high-altitude adaptation of the Tibetan goat. However, the whole DSG3 gene (44 kbp) consisting of 16 exons in the goat genome was not entirely covered by the exome sequencing. In this study, we resequenced all the 16 exons of the DSG3 gene in ten Chinese native goat populations. Twenty-seven SNP variants were found between the lowland and highland goat populations. The genetic distance (FST) of significant SNPs between the lowland and highland populations ranged from 0.42 to 0.58. By using correlation coefficient analysis, linkage disequilibrium, and haplotype network construction, we found three non-synonymous SNPs (R597E, T595I, and G572S) in exon 5 and two synonymous SNPs in exons 8 and 16 in DSG3. These mutations significantly segregated high- and low-altitude goats in two clusters, indicating the contribution of DSG3 to the high-altitude hypoxia adaptation in the Tibetan goat.

#### Edited by:

John Anthony Hammond, Pirbright Institute (BBSRC), United Kingdom

#### Reviewed by:

Shaojun Liu, Hunan Normal University, China Keith Ballingall, Moredun Research Institute, United Kingdom

> \*Correspondence: Yuehui Ma yuehui.ma@263.net

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 05 June 2018 Accepted: 29 October 2018 Published: 19 November 2018

#### Citation:

Kumar C, Song S, Jiang L, He X, Zhao Q, Pu Y, Malhi KK, Kamboh AA and Ma Y (2018) Sequence Characterization of DSG3 Gene to Know Its Role in High-Altitude Hypoxia Adaptation in the Chinese Cashmere Goat. Front. Genet. 9:553. doi: 10.3389/fgene.2018.00553 Keywords: DSG3, exons, SNPs, Tibetan goat, hypoxia, high-altitude adaptation

## INTRODUCTION

Low oxygen or hypoxia is the hardest environmental challenge for humans and animals existing at high altitude. Whole-genome resequencing analyses have been carried out to investigate the genetic basis of the hypoxia adaptation in many domestic animals, such as yak (Qiu et al., 2012), Tibetan pig (Ai et al., 2014), dog (Li et al., 2014), and birds (Cai et al., 2013). Various candidate genes have been identified for high-altitude adaptation, including EPAS1 (endothelial PAS domain protein 1) and HBB (hemoglobin beta) (Gou et al., 2014; Fan et al., 2015; Song et al., 2016). In prior studies, both EPAS1 and HBB revealed six non-synonymous mutations potentially affecting the gene function and influencing high-altitude hypoxic adaptation in dogs (Fan et al., 2015). These studies proved that the domestic animals are a useful animal model to explore high-altitude adaptation.

The domestic goat is found at sea level (30 m) to the high plateau (4700 m). In contrast to its low-altitude counterpart, the Tibetan goat has unique anatomical and physiological characteristics such as higher hemoglobin concentration, a larger heart, and bigger lungs that equip it to live at high altitudes (Li et al., 2004b). Our previous exome-sequencing analysis also reveals that the cardio-vascular system-related genes may play a crucial role in high-altitude adaptation (Song et al., 2016). Among the candidate genes under strong selection, the DSG3 (Desomoglein 3) loci was the most significant. DSG3 is a desmoglein gene located in a cluster on goat chromosome 24 containing 44 kbp that encodes 16 exons. It is expressed in stratifying epithelia, but its precise role is not fully understood in the cardiovascular system (Delva et al., 2009). Here, we extended our previous analyses on large goat populations from different altitudes to identify if the DSG3 candidate gene has a role in high-altitude hypoxia adaptation. By using correlation coefficient analysis, linkage disequilibrium, and haplotype network construction, we explored both the non-synonymous and synonymous mutation sites of the DSG3 gene. These mutations significantly segregated goat populations between high- and low-altitude groups.

## MATERIALS AND METHODS

## Populations Sampling

A total of 125 Chinese cashmere goats were used for Sanger sequencing and covered the entire genomic region of DSG3 (44 kbp) consisting of 16 exons. Isolated genomic DNA used from Ritu (RT, 4700 m), Bange (BG, 400 m), Nanjiang (NJ, 1700 m), Inner Magnolia (IM, 1500 m), and Liaoning (LN, 30 m) cashmere goats selected from 330 individuals was analyzed previously for Exome Sequencing by the Illumina Hiseq2000 Platform (Song et al., 2016), which were randomly selected from 4 different locations (**Table 1** and **Supplementary Table S1** and **Supplementary Figure S1**). Furthermore, five goat populations including Dulan (DL, 3000 m), Hanshan (HS, 1000 m), Guangfeng (GF, 500 m), Hainan (HN, 120 m), and Changjiangsanjiaozhou (CZ, 50 m) were selected from five different geographical locations in China (**Table 1** and **Supplementary Table S1** and **Supplementary Figure S1**). The tissue samples of these goat populations were taken by cutting ear tissue, using an ear-cutting instrument without applying an anesthetic or analgesic agent. After the collection, the tissue samples were preserved in a tube containing 75% alcohol, which was kept in the ice box during transportation to the laboratory. The tissue samples were collected from all the animals undergoing the experiment according to the recommendations and guideline of the Institute of Animal Sciences (IAS, CAAS), with full approval from the Animal Care and Use Committee of Chinese Academy of Agricultural Sciences and the Ministry of Agriculture of the People's Republic of China.

Genomic DNA was isolated from ear tissue using a modified commercial Kit protocol (Promega, the Wizard <sup>R</sup> ). The quality and integrity of the purified DNA from each goat per sample location were determined using a Nanodrop 2000 (Thermo Fisher Scientific, DE) and pooled according to the concentration bases. The validations in the larger goat population were carried out by following sample data and locations. Highland group consisted of Ritu (RT, n = 17) and Bange (BG, n = 17) goats from the Ritu and Bange region of Tibet. The lowland group consisted of Dulan (DL, n = 8) and Nanjiang (NJ, n = 14) cashmere goats from Qinhai, Dulan, and Aksu areas of Xinjiang region, Inner Magnolia (IM, n = 13) from the Earlangshan region of Inner Mongolia, Hanshan (HS, n = 8) from Inner Magnolia city, Guangfeng (GF, n = 15) from Shangrao, Jiangxi., Hainan (HN, n = 8) from Hainan city, Changjiangsanjiaozhou (CZ, n = 13) from Changjiangsanjiaozhou city and Liaoning (LN, n = 15) cashmere goats from Gai Zhou, Liaoning, as shown in **Table 1**, **Supplementary Table S1** and **Supplementary Figure S1**.

### Polymerase Chain Reaction (PCR) and Single-Nucleotide Polymorphisms (SNPs) detection

Sixteen pairs of primers were used for DSG3 gene amplification and their sequences are shown in **Table 2**. Polymerase chain reaction (PCR) was optimized with a 50 ng DNA template using 2xTaq master mix (BioMed: Beijing Co., Ltd.), supplemented with ddH20 solutions and 10 pmol/l of forward and reverse primers and in a 30 ul mixture. The PCR was performed by initial denaturation for 5 min at 95◦C, followed by 40 cycles at 95◦C for 30 s, annealing at 60–63◦C for 30 s and an extension at 72◦C for 45 s, and a final extension at 72◦C for 5 min and stored at 4◦C. The PCR products were analyzed by running on 1.5% of gel electrophoresis that was made by a mixture of 0.75 g Regular Agarose Biowest (Gene Company) and 50 ml of 1x TAE buffer. The mixture was also added with 5 µl goldview chemical for viewing the gel on UV transilluminator. The electrophoresis tank (Bio-rad Sub-Cell <sup>R</sup> GT) and the PowerPac Basic (Bio-rad <sup>R</sup> ) were used for running the electrophoresis. Five microliters of the PCR products were mixed with 10X loading dye, which is composed of 0.25% Bromophenol blue, 0.25% Xylene Cyanol FF, 15% Ficoll, and H2O. The mixture was loaded into each running well of 1.5% gel. The electrophoresis was run under 1X TAE buffer, carried out at a 75W constant, 147 V for 40 min. When electrophoresis was finalized, the gels were set on the UV transilluminator. All the purified PCR products were directly sequenced by the Sanger Sequence methods, using the ABI 3730 sequence analyzer (Applied Bio-System, United States) in both the directions, and using the forward primer. The sequences were assembled and analyzed for single-nucleotide polymorphism (SNPs), using the Seq Man program of Laser gene software. The SNPs were detected from Primer 4, Primer 5, Primer 6, Primer 8, Primer 9, Primer 11, Primer 14, and Primer 16 in the DSG3 gene through DNA pooled strategies. Furthermore, these primers were used in large populations on an individual basis to identify polymorphism and the validation of the SNPs.

#### Statistical Analyses

Pairwise genetic distance (FST) was calculated between High-Lowland and High-Midland and Mid-lowland goat

TABLE 1 | Sample locations and major allele frequency of three non-synonymous variants and two synonymous mutations in DSG3 gene.


#### TABLE 2 | Sequences of the primers.


populations to find out the genetic divergence as described by Weir and Cockerham (1984). Fisher's exact test was estimated respectively, to observe a statistical significance. LD-block structure and parameter (D' and r 2 ) were figured using Haploview 4.2 software (Barrett et al., 2005). To explore the signature of selection and evolutionary change, the six SNPs (four Non-Synonymous and two Synonymous variants) were analyzed for the diversity of haplotype and their frequency was estimated using phase 2.1 (Stephens et al., 2001; Crawford et al., 2004). To obtain a reliable result, we used option – X 10 to increase final runs time and – C option for lowand high-land goat groups to ensure that any similarity to

the haplotype is not taken into account. The haplotype was embedded in DNASP Version5 (Librado and Rozas, 2009) for network construction and phylogenetic diversity among the identified haplotypes was inferred through a medianjoining network analysis using the method recommended by Bandelt et al. (1999). To detect the functions of three major non-synonymous SNPs in the DSG3 gene, the SIFT software<sup>1</sup> has been used. The protein sequence of Tibetan goats was compared with the reference protein sequence of goats desmoglein-3. The cutoff value in the SIFT program is a tolerance score of ≥ 0.05. The higher the tolerance score, the lower the expected functional impact of a particular amino acid substitution.

#### RESULTS

#### Allele Frequency and Genetic Differentiation

In this study, we sampled 128 Chinese indigenous goats belonging to ten populations living across a wide distribution of altitudes. The ten populations ranged from sea level (30 m) to high altitude (4700 m) (**Table 1** and **Supplementary Table S1**). To find out the polymorphism and to measure allele frequencies of the DSG3 gene accurately, we resequenced the entire genomic region of the DSG3 gene. Among the 27 SNPs that were identified in DSG3, five SNPs showed the most significant differentiation in the analysis of allele frequency and global FST, including three non-synonymous SNPs, SNP1 (R597E, Chr24: 25794694, exon 5), SNP2 (T595I, Chr24: 25794695, exon 5), and SNP3 (G572S, Chr24: 25794771, exon 5), and two synonymous SNPs, SNP4 (Chr24: 25799255, exon 8), and SNP5 (Chr24: 25817330, exon 16) (**Table 3**). A striking change in allele frequency with elevation in altitude was observed in goat population. For example, low R597E allele frequencies in DSG3 were found in lowland populations including CZ (0.00, 50m), HN (0.00, 120 m), and HS (0.00, 1000 m), but were elevated in the IM (0.65, 1500 m), DL (0.56, 3000 m), BG (0.67, 4000 m), and reached the highest frequency in the RT (0.97, 4700 m) population (**Table 1**, **Supplementary Table S2**, and **Figure 1A**). The other four SNPs in DSG3 exhibited a similar pattern of change (**Table 1**, **Supplementary Table S2**, and **Figures 1**, **2**). All these results indicated that the frequencies of the five variants in the DSG3 locus remain rare in the lowland goat population but increased with the elevation of altitude.

Using allele frequencies of the three non-synonymous and two synonymous SNPs measured above, we measured the correlation co-efficiency of variant allele frequency per population versus the altitude of sampling location using Graph Pad Prism 5 software. The result of this analysis showed a significant positive linear correlation between the elevated altitude and the frequency of a variant allele in DSG3 (**Table 3** and **Supplementary Table S3**). The linear correlation coefficient between the elevated altitude and the frequencies of variant allele was r = 0.85, P < 0.05 for SNP1 (**Figure 1B**), r = 0.83, P < 0.05 for SNP2 (**Figure 1D**), r = 0.84, P < 0.05 for SNP3 **(Figure 1F**). The linear correlation coefficient was r = 0.91, P < 0.05 for SNP 4 (**Figure 2B**), and r = 0.90, P < 0.05 for SNP5 (**Table 3**, **Supplementary Table S3**, and **Figure 2D**). The significant positive correlation between the elevated altitudes and the variant allele frequency at the DSG3 locus in ten Chinese native goat populations strongly suggested that the candidate gene DSG3 potentially contributed to the high-altitude adaptation in goats.

To further understand the genetic differentiation that occurred between the lowland and highland goat populations, we first estimated the global FST value for all the identified loci. The mean FST value was 0.24, indicating the high genetic divergence in these ten Chinese native goat populations, which is mainly attributed to the divergence between the lowland and highland goat populations. The global FST of all the 27 loci ranged from 0.0 to 0.61 (**Supplementary Table S2**). The three most divergent non-synonymous SNPs had the value that ranged from 0.51 (SNP1) and 0.52 (SNP2) to 0.55 (SNP3), whereas the top two synonymous SNPs had a global FST of 0.47 (SNP4) and 0.61 (SNP5) (**Table 3** and **Supplementary Table S2**). Subsequently, the pairwise genetic distance (FST) was measured between High-Low, High-Mid, and Mid-Low altitude goat populations (**Table 4**). The FST values of non-synonymous variants between high-altitude and low-altitude populations were 0.42 (SNP2, P < 0.001) and 0.45 (SNP1/SNP3, P < 0.001), which reflected a remarkable genetic differentiation (**Table 4**). Similarly, the FST values of the two synonymous substitutes were 0.47 (SNP4, P < 0.001) and 0.58 (SNP5, P < 0.001). These results showed high genetic differentiation between High-Low altitude goat populations compared with the High-Mid and Mid-Low altitude goat populations (**Table 4**). Taken together, both of global FST values and pairwise genetic distance (FST) values demonstrate


<sup>1</sup>http://sift.bii.a-star.edu.sg/

FIGURE 2 | The pattern of allele frequency of SNP4 (A) and SNP5 (C) of DSG3 has continuously changed from sea level at 30 m (LN goat population) to the elevation at 4700 m (RT goat population). Blue, a frequency of the mutant allele, Green, a frequency of a reference allele in each goat population, respectively. The plot of the correlation analysis between major allele frequency of SNP4 (B), SNP5 (D) and sampling locations of 10 cashmere goat populations, r represents the correlation coefficient (r = 0.9061, P < 0.05). Asterisks (∗∗∗) represents extremely significantly.

a remarkable population differentiation between the high- and low-altitude goat populations.

#### Linkage Disequilibrium (LD) and Haplotype Network Analysis

The LD haploblock pattern was used to analyze the DSG3 loci (**Supplementary Figure S2**). We constructed LD blocks using the five significant SNPs and calculated D◦ , r 2 , and an algorithm of the odds (LOD). The variant allele of these five SNPs was tightly linked as a Haplo-Block on chromosome 24 in highland goat populations (RT-BG), with the three nonsynonymous SNPs in the main position and showed strong correlation by D◦= 1.0 −1.0, r <sup>2</sup> = 0.63–1.00, and LOD = 5.19– 9.82 (**Table 5** and **Figure 3A**). In contrast, the linkage of three non-synonymous SNPs and two synonymous SNPs was loosely found in lowland goat populations (DL-NJ-AB-LN-GF-CZ-HN-HS). However, the LD level of SNPs varied from D = 0.58–1, r <sup>2</sup> = 0.10–0.94, and LOD = 2.85–31.71 (**Table 5** and **Figure 3B**). The haplotype pattern was analyzed to detect the effect of natural selection on selected genes, including DSG3. Twelve haplotypes were obtained from four non-synonymous (SNP1/R392Q, SNP2/T595I, SNP3/G572S, and T480N) and two from synonymous SNPs (SNP4 and SNP5) from 128 goats from lowland and highland goat populations. The top three haplotypes were detected: HL1, HL2, and L7 with a

TABLE 4 | Pairwise genetic distances (FST) & Fisher exact test (p-value) calculated between highland and reference goat populations.


Top five SNPs are shown with asterisk.

TABLE 5 | Pairwise linkage disequilibrium of the non-synonymous and synonymous SNPs of DSG3 in the highland and lowland goat populations.


frequency of 42%, 40%, and 30%, respectively (**Figure 3C**). Among these haplotypes, HL1 showed a remarkably higher haplotype frequency of a mutant allele (TCCCAA) in highland population than in lowland populations (18.0%). However, HL2 indicated a much lower haplotype frequency of the reference allele (CGTAGG) in the highland (7.0%) than in the lowland population (42%). L7 showed a haplotype frequency of (CGTCGG) in lowland (31%) and indicated a separate haplotype cluster in lowland populations. These results suggested that the haplotype TCCCAA of the DSG3 gene has been selected during the high-altitude adaptation of the Tibetan goat.

## Prediction of Three Non-synonymous SNPs

abbreviation: HL, high and low altitude; H, high altitude; L, low altitude.

An online tool SIFT was used to investigate the potential impact of the non-synonymous substitutions on protein structure and function. SIFT aligns a query sequence and a subject amino acid sequence to determine the effect of an amino acid substitution. SIFT software takes protein FASTA sequences as input and is aligned with PSI-BLAST. In our result analysis, we found three nsSNPs to be predicated, as shown in **Table 6**, by using SIFT software.

nucleotide differences. Circles are color code according to the population. Blue: highland (RT and BG), Green: lowland (LN, CZ, HN, GF, HS, IM, NJ, and DL),


Amino acid substitutions changed in Tibetan goat protein sequences may have been predicted to modify the function of DSG3 gene (low certainty). The cutoff value in the SIFT program is a tolerance score of ≥ 0.05. The higher the tolerance score, the less functional impact a particular amino acid substitution is expected to have.

#### DISCUSSION

There are many animal species that live on the Tibetan plateau. The plateau consists of 25% of Mainland China, with an average altitude exceeding 4,500 m (Thompson et al., 2000; Wei et al., 2016). Up to now, genomic studies indicated several significant pathways and functional categories including energy metabolism and oxygen transmission, DNA repair, and ATPase production, in response to hypoxia at the Tibetan plateau (Qiu et al., 2012; Ge et al., 2013; Li et al., 2013; Qu et al., 2013; Xiang et al., 2013). Moreover, SNP data have been obtained from population surveys to find the mechanism underlying plateau adaptation (Wei et al., 2016) and have successfully identified candidate genes EGLN1, PPARA, and EPAS1 as transcription factors with a pivotal role in adaptation (Buroker et al., 2012; Xiang et al., 2013; Gou et al., 2014; Wang et al., 2014).

In our previous exome sequencing of 330 cashmere goats, we identified a genomic region containing a desmosome gene DSG3 (Desomoglein3) under strong selection. DSG3 gene is a desomoglein protein-coding gene located in a cluster on goat chromosome 24, consisting of 16 exons.

In this study, the entire region of (44 kbp) DSG3 was sequenced. Subsequently, the single-nucleotide polymorphism was identified in a large population of indigenous Chinese goats to determine the genetic difference between the lowland and highland goat populations. Our results showed polymorphism in DSG3 (a) segregated indigenous Chinese goat population in two clusters; (b) a significant positive linear correlation between the elevated altitude and the frequency of a mutant allele; (c) nonsynonymous candidate mutations indicating the contribution of gene in high-altitude adaptation of Tibetan goats.

Moreover, our previous studies suggested that the positive directional selections of genes such as EPAS1 (Endothelial Per-ARNT-Sim (PAS) domain protein 1), EDNRA (Endothelin-1 receptor Precursor), SIRT1 (Sirtuin type 1), and ryanodine receptor 1 (RYR1) are linked to hypoxia. EPAS1, RYR1, DSG2 (Desomgelin2) are related to cardiomyopathy and PTPRJ (Receptor-type tyrosine-protein phosphatase eta), FUT1 (fucosyltransferase1), HEG1 (heart development protein with EGF-like domains 1), PTPRZ1 (tyrosine phosphatase receptortype Z polypeptide 1), SIGLEC1 (Sialic acid-binding Ig-like lectin 1 sialoadhesin), NPC1L1 (Niemann-Pick disease-type C1 genelike 1), and NES (Nestin) are connected with the cardiovascular system, whereas DSG3 is important in maintaining the normal structure and function of hairs (Song et al., 2016). Also, some studies suggested that the DSG3 gene is associated with skin development (Koch et al., 1998; Song et al., 2016).

A high phenotypic variation exists in the Cashmere goat introduced by natural and artificial selection for over 10,000 years, in line with human demand (Naderi et al., 2008; Wang et al., 2016). Owing to evolutionary and artificial selection, the goat has been adapting to different harsh environmental conditions (Song et al., 2016). Multiple genes are described in the significant role of hair and fleece growth in cashmere goats, such as the POU1F1 and PRL genes associated with cashmere yield, diameter, and length, whereas the LHX2 and LIM genes regulate the generation of hair. The FGF5 gene is a key regulator that controls hair length and the FGF9 gene to promote the regeneration of hair follicle after wounding. The WNT2 gene is involved in the initiation of hair follicle (Wang et al., 2016). Furthermore, the DSG1 gene is involved in the communication of hair follicle cell and morphogenesis of hair follicle (Hanakawa et al., 2004; Dong et al., 2013). The DSG4 gene affects wool traits and is responsible for coat color in Cashmere goats (E et al., 2016). Furthermore, the DSG3 gene may have linked with additional fiber growth in the cashmere goats. Other studies have reported that the DSG1, DSG3, and DGS4 genes were expressed in keratinocytes of the basal and immediate suprabasal cell layers (Koch et al., 1998; Bazzi et al., 2006; Dusek et al., 2006). According to Nazari-Ghadikolaei et al. (2018), keratinocytes produce keratin, the main protein for hair, nail, and skin synthesis (Nazari-Ghadikolaei et al., 2018).

The biological function of DSG3 is keratinocyte cell-to-cell adhesion in the basal and suprabasal layers of stratified squamous epithelia. DSG3 has been shown to be the self-antigen for autoantibodies in the pemphigus vulgaris (PV) disease and is known as PV antigen (PVA) (Koch et al., 1997). In past studies of mice and humans, DSG3 was predominantly expressed in the outer layer of skin tissues such as basal and suprabasal cell layers of epidermis (Kljuic et al., 2003), whereas other members of the desmoglein gene family DSG1 are only expressed in suprabasal cell layers and those of DSG2 in the basal cell layer (Koch et al., 1997). DSG3 provide desmogleing compensations with DSG1 in pemphigus foliaceus (PF), which is a potentially fatal autoimmune blistering skin disease in which autoantibodies against DSG3 and DSG1 cause loss of keratinocyte cell adhesion (Payne et al., 2005).

Similarly, desmoglein 3 (DSG3), a SLC24A5 gene, is involved in lighter skin pigmentation in Europeans, showing strong signals of positive selection in the high-altitude populations (Huerta-Sanchez et al., 2013). Furthermore, desmoglein genes (DSG1-4) are expressed in the epidermis and myocardium tissue and their disruption causes some autoimmune diseases that affect the skin, heart, and mucous membranes in humans

(Garrod and Chidgey, 2008; Thomason et al., 2010). Desmoglein3 (DSG3) has been identified as one of the autoantigens in an autoimmune blistering skin disease called pemphigus vulgaris (PV) (Stanley and Amagai, 2006; Hartlieb et al., 2013). In this disease, circulating autoantibodies targeting DSG3 induce loss of cell cohesion within the epidermis and mucous membranes. DSG3 was considered to be the new gene marker for terminal differentiation and was found to be a highly upregulated gene during early phase acute hypobaric hypoxia in the rats (Sharma et al., 2015). There were a few studies related to the functional role of DSG3; however, its clear function has not been established yet. It has been suggested that DSG3 may involve more than cell-cell adhesion (Tsang et al., 2010). The expression of the DSG3 gene mostly occurred in the stratified epithelial tissues (Stanley and Amagai, 2006). An epithelial tissue plays an essential role in the diseases of altitude because these tissue cells are the lining of the alveoli of lungs (air sacs) and the blood vessels (endothelium). These tissues have no blood vessels and thus must receive oxygen from the blood vessels in the adjacent connective tissue. For this reason, the tissues are particularly susceptible to damage from low oxygen at high altitude. Highaltitude hypoxia impacts structures of vital cells such as sodium and potassium pumps and transcription of the genes slows the sodium and potassium pumps in the alveoli of the lungs. Decreasing the number of these pumps contributes to the accumulation of water in the alveoli and causes high-altitude pulmonary edema (Litch, 2006). Several candidate genes have a key role in controlling the complex cellular processes, energy metabolism, oxygen respiration chain, and reform adenosine triphosphate (ATP) in the cell tissues (Niu et al., 2017). During the evolutionary process, the genes functioning within the oxygen respirations chain have been considered under positive selections. These genes are important for adaptation to highaltitude environments (Xu et al., 2007). Our current study has also shown the importance of Desmosomes genes in high-altitude hypoxia adaptation which has been supported by the SNP data validation analysis.

To our knowledge, there are no research publications related to polymorphism of the DSG3 gene and its role for adaptation to hypoxia at high altitude. In this study, twenty-seven SNPs sites were found from the whole sequence of the DSG3 gene. The allele frequency of variants was observed to be very low in lowland than in highland goat populations. These results have indicated that allele frequencies of variants might be associated with evolution, selection, genetic variation, and ecological conditions (Li et al., 2004a; Wang et al., 2011). Pearson correlation analysis of SNP1, SNP2, SNP3, SNP4, and SNP5 showed a significant relationship between allele frequency and elevation of altitude. These results are in line with the EPAS1 gene in the Cashmere goat and Tibetan dog (Fan et al., 2015; Song et al., 2016).

Similarly, the global FST value varied among the Chinese indigenous goat population at all the SNPs sites. The pairwise genetic distances were an ideal index to measure the polymorphism between the groups. Some exons and introns were excluded from further analysis due to a lower pairwise FST value. Three non-synonymous and two synonymous mutations have shown large polymorphisms and pairwise genetic distance

in the DGS3 gene. Furthermore, this segregated the goat population in distinct branches, as discussed in previously reported studies (Xiang-Long and Valentini, 2004; Di et al., 2011; Ling et al., 2012). Linkage disequilibrium and haplotype analysis showed that the non-synonymous SNPs are highly linked with synonymous SNPs and had higher haplotype frequency variations in the genome region of DSG3 in highland goat populations (Nordborg and Tavaré, 2002; Conrad et al., 2006; Li et al., 2006).

In this study, three non-synonymous mutations are found from the DSG3 gene, out of which one mutation G597S is similar to the mutation G305S at EPAS1 and G14S at HBB (Gou et al., 2014; Fan et al., 2015) in the dog. The G305S is an important mutation with a protein functional change in the PAS domain and has a major effect on the blood flow resistance (Adzhubei et al., 2010; Gou et al., 2014). Moreover, it has been demonstrated as a strongly conserved amino acid site at EPAS1 in the dog (Fan et al., 2015). Therefore, the non-synonymous

mutation G597S is likely to affect the function of the DSG3 genes (**Figures 4A–C**) and suggests a role in high-altitude hypoxic adaptation.

Gene mutations and polymorphism are important to understand selection, adaptation, and biological evolutionary processes and to identify the genes related to genetic disease and particular traits (Akey et al., 2002; Nielsen, 2005). The single-nucleotide polymorphism (SNPs) variations in the exonic region of DNA sequences have shown many evolutionary changes. Although the substitution of nonsynonymous mutations changed the coding sequence and altered the function of the protein, whereas the synonymous SNP did not change the coding sequence, it may affect the timing of cotranslational protein folding (Sun et al., 2013). In our studies, we have explored five significant SNPs in the most important skin gene DSG3, including three nonsynonymous and two synonymous substitutes. They have shown a significant positive linear correlation between mutant allele frequency and elevated altitude in the Chinese indigenous goat population.

Future investigations of the DSG3 gene in goats should focus on the expression level of the gene associated with mutations and altitude. This study has provided the information on the potential function of an allele of DSG3 in high adaptation and strengthens the understanding of adaptation of the Tibetan goat to the extremely harsh environment of high altitude.

#### CONCLUSION

Our research has, for the first time, shown the role of the Desmosomes gene for the high-altitude hypoxia adaptation and abundant genetic diversity between goat populations.

#### REFERENCES


We found that three non-synonymous candidate mutations (R597E, T595I, and G572S) and two synonymous substitutions significantly segregated these goat populations. Moreover, these results indicated the contribution of DSG3 in high-altitude adaptation and provided new insights for Tibetan cashmere goats.

#### AUTHOR CONTRIBUTIONS

CK and LJ wrote the manuscript. XH, QZ, and YP contributed to sample collections and designing the manuscripts. CK and SS has performed the data analysis. YM and LJ conceived the study design. YM, KKM, and AAK interpreted the results and revised the manuscript. All the authors read and approved the final manuscript.

## FUNDING

This research was funded by the National Natural Science Foundation of China (Nos. 31472064 and 31601910), the Special Fund for Agro-Scientific Research in the Public Interest (201303059), and the earmarked fund for Modern Agro-industry Technology Research System (CARS-40-01). LJ was supported by the Elite Youth Program in the Chinese Academy of Agricultural Sciences.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00553/full#supplementary-material

tibetan chinese at the qinghai-tibetan plateau. Blood Cells Mol. Dis. 49, 67–73. doi: 10.1016/j.bcmd.2012.04.004



using microsatellite information. Asian-Australas J. Anim. Sci. 25, 177–182. doi: 10.5713/ajas.2011.11308


environment of the tibetan plateau. Genome Biol. Evol. 6, 2122–2128. doi: 10. 1093/gbe/evu162


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Kumar, Song, Jiang, He, Zhao, Pu, Malhi, Kamboh and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Dissecting the Genomic Architecture of Resistance to Eimeria maxima Parasitism in the Chicken

Kay Boulton<sup>1</sup> \*, Matthew J. Nolan<sup>2</sup> , Zhiguang Wu<sup>1</sup> , Valentina Riggio<sup>1</sup> , Oswald Matika<sup>1</sup> , Kimberley Harman<sup>2</sup> , Paul M. Hocking<sup>1</sup>† , Nat Bumstead<sup>3</sup>† , Pat Hesketh<sup>3</sup> , Andrew Archer<sup>3</sup> , Stephen C. Bishop<sup>1</sup>† , Pete Kaiser<sup>1</sup>† , Fiona M. Tomley<sup>2</sup> , David A. Hume1,4 , Adrian L. Smith3,5, Damer P. Blake<sup>2</sup> and Androniki Psifidi1,2,6

<sup>1</sup> The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, United Kingdom, <sup>2</sup> Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, London, United Kingdom, <sup>3</sup> Enteric Immunology Group and Genetics and Genomics Group, Pirbright Institute, Woking, United Kingdom, <sup>4</sup> Mater Research Institute, The University of Queensland, Brisbane, St. Lucia, QLD, Australia, <sup>5</sup> Department of Zoology, Sir Peter Medawar Building for Pathogen Research, University of Oxford, Oxford, United Kingdom, <sup>6</sup> Department of Clinical Sciences and Services, Royal Veterinary College, University of London, Hatfield, United Kingdom

#### Edited by:

John Anthony Hammond, Pirbright Institute, United Kingdom

#### Reviewed by:

Emanuel Heitlinger, Leibniz Institute for Zoo and Wildlife Research (LG), Germany Michael Kogut, Agricultural Research Service (USDA), United States

> \*Correspondence: Kay Boulton kay.boulton@roslin.ed.ac.uk †Deceased

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 31 May 2018 Accepted: 22 October 2018 Published: 26 November 2018

#### Citation:

Boulton K, Nolan MJ, Wu Z, Riggio V, Matika O, Harman K, Hocking PM, Bumstead N, Hesketh P, Archer A, Bishop SC, Kaiser P, Tomley FM, Hume DA, Smith AL, Blake DP and Psifidi A (2018) Dissecting the Genomic Architecture of Resistance to Eimeria maxima Parasitism in the Chicken. Front. Genet. 9:528. doi: 10.3389/fgene.2018.00528 Coccidiosis in poultry, caused by protozoan parasites of the genus Eimeria, is an intestinal disease with substantial economic impact. With the use of anticoccidial drugs under public and political pressure, and the comparatively higher cost of live-attenuated vaccines, an attractive complementary strategy for control is to breed chickens with increased resistance to Eimeria parasitism. Prior infection with Eimeria maxima leads to complete immunity against challenge with homologous strains, but only partial resistance to challenge with antigenically diverse heterologous strains. We investigate the genetic architecture of avian resistance to E. maxima primary infection and heterologous strain secondary challenge using White Leghorn populations of derived inbred lines, C.B12 and 15I, known to differ in susceptibility to the parasite. An intercross population was infected with E. maxima Houghton (H) strain, followed 3 weeks later by E. maxima Weybridge (W) strain challenge, while a backcross population received a single E. maxima W infection. The phenotypes measured were parasite replication (counting fecal oocyst output or qPCR for parasite numbers in intestinal tissue), intestinal lesion score (gross pathology, scale 0–4), and for the backcross only, serum interleukin-10 (IL-10) levels. Birds were genotyped using a high density genome-wide DNA array (600K, Affymetrix). Genome-wide association study located associations on chromosomes 1, 2, 3, and 5 following primary infection in the backcross population, and a suggestive association on chromosome 1 following heterologous E. maxima W challenge in the intercross population. This mapped several megabases away from the quantitative trait locus (QTL) linked to the backcross primary W strain infection, suggesting different underlying mechanisms for the primary- and heterologous secondary- responses. Underlying pathways for those genes located in the respective QTL for resistance to primary infection and protection against heterologous challenge were related mainly to immune response, with IL-10 signaling in the backcross primary infection being the most significant. Additionally, the identified markers associated with IL-10 levels exhibited significant additive genetic variance. We suggest this is a phenotype of interest to the outcome of challenge, being scalable in live birds and negating the requirement for single-bird cages, fecal oocyst counts, or slaughter for sampling (qPCR).

Keywords: intercross, backcross, Eimeria maxima, QTL, resistance, interleukin-10, oocyst output

#### INTRODUCTION

fgene-09-00528 November 24, 2018 Time: 15:33 # 2

Coccidiosis is an intestinal disease caused by intracellular protozoan parasites of the genus Eimeria (Shirley et al., 2005). The control of coccidiosis is a challenge to the international poultry industry, with economic losses estimated at USD 3 billion annually (Dalloul and Lillehoj, 2006). Current control of coccidiosis relies on the prophylactic use of anticoccidial drugs, or vaccination with formulations of live wild-type or attenuated parasites (Crouch et al., 2003; McDonald and Shirley, 2009). However, use of some anticoccidial drugs has been curtailed by legislation, while the limited production capacity and costs of live attenuated vaccines compromise their utility in broiler flocks (Hong et al., 2006). Thus, there is a need for complementary strategies to control coccidiosis in poultry. A promising approach would be to breed chickens for increased genetic resistance and increased vaccine response to Eimeria parasitism since there is evidence for relevant host genetic variation (Johnson et al., 1986; Bumstead and Millard, 1992).

Coccidiosis in poultry is caused by seven distinct Eimeria species (Reid et al., 2014), with Eimeria maxima being one of the most common causes of coccidiosis in commercial broilers. Immunity introduced by primary infection (vaccination) against E. maxima is commonly strain-specific, with immune escape contributing to sub-clinical coccidiosis symptoms that include decreased feed conversion efficiency, marked weight loss and low performance (Fitz-Coy, 1992; Blake et al., 2005).

Johnson et al. (1986) demonstrated variance in coccidiosis susceptibility in chickens as a prerequisite to selective breeding for resistance. A subsequent study using several inbred White Leghorn lines established variance for benchmark phenotypes when chickens were infected with controlled doses of Eimeria spp. (Bumstead and Millard, 1987, Bumstead and Millard, 1992). The between-line variation observed in oocyst production by the different lines was not correlated with weight loss or mortality, indicating that within-trait observations were a result of effect accommodation rather than parasite restriction. The greatest differences in parasite replication (PR) were between lines 15I and C major histocompatibility complex (MHC) haplotype B12 (C.B12) chickens that produced relatively high and low numbers of oocysts, respectively (Bumstead and Millard, 1987; Smith et al., 2002). Most notably, primary infection with the Houghton or Weybridge reference E. maxima strains induce 100% protection against secondary homologous challenge in 15I and C line chickens (Smith et al., 2002). However, the outcome of heterologous challenge varied by parasite strain and host genotype combination (Smith et al., 2002; Blake et al., 2004, 2005). Regardless of the substantial financial losses to industry caused by coccidiosis, few studies have attempted to identify quantitative trait loci (QTL) for resistance to E. maxima infection and there are no relevant studies on the genetics of heterologous secondary challenge response.

The present study extends previous work in inbred chicken lines to determine the genetic architecture of E. maxima resistance, i.e., lack of PR, and protection against secondary challenge with a heterologous E. maxima strain. First, an F2 intercross of inbred White Leghorn chicken lines C.B12 × 15I were initially infected with E. maxima H, followed 3 weeks later by challenge with E. maxima W to investigate response to challenge with the heterologous strain. Fecal oocyst output was counted to determine severity of challenge. Second, a backcross population from the same two inbred lines [(C.B12 × 15I) × C.B12] was infected with E. maxima W to study primary resistance to parasitism. Three phenotypes were determined for these birds following infection: PR by qPCR for parasite numbers in intestinal tissue, intestinal lesion score (LS) (gross pathology, scale 0–4) and levels of serum interleukin-10 (IL-10), a novel biomarker, found to be positively correlated with the pathology trait in chickens infected with E. tenella (Wu et al., 2016; Boulton et al., 2018). All birds were then genotyped using a 600K Affymetrix <sup>R</sup> Axiom <sup>R</sup> HD array (Kranis et al., 2013), enabling genome-wide association studies (GWASs), followed by pathway analysis to identify candidate genomic regions, pathways, networks and genes for resistance to E. maxima primary infection and effective responses to challenge with a heterologous strain.

## MATERIALS AND METHODS

#### Ethics Statement

These trials were conducted under Home Office Project Licence in accordance with Home Office regulations under the Animals (Scientific Procedures) Act 1986 and the guidelines set down by the Institute for Animal Health and RVC Animal Welfare and Ethical Review Bodies.

#### Parasites

The E. maxima Houghton (H) and Weybridge (W) strains were used throughout these studies (Norton and Hein, 1976). Routine parasite passage, sporulation, and dose preparation were undertaken as described previously (Eckert et al., 1995) using specific pathogen free Light Sussex or Lohman LSL chickens. Oocysts were used within 1 month of harvest.

#### Animals

Inbred chicken lines 15I and C derived from White Leghorn flocks at USDA-ARS Avian Disease and Oncology Laboratory

in East Lansing, MI, United States, were maintained by random mating within the specified-pathogen-free (SPF) flocks at the Pirbright Institute [formerly the Institute for Animal Health (IAH)], United Kingdom since 1962 and 1969, respectively.

F2 intercross birds (n = 195) were generated by crossing nine F1 (C.B12 × 15I) male progeny with 27 unrelated F1 female progeny at the IAH (Compton site). Six birds from each of the two parental lines, 15I and C.B12, were also hatched and kept under the same experimental conditions as F2 (individual cages post-challenge).

To generate the backcross (n = 214), 20 F1 (C.B12 × 15I) male progeny were crossed with 100 unrelated C.B12 line females. The breeding was performed in the SPF Bumstead facility at the Roslin Institute, The University of Edinburgh, United Kingdom. Day old chicks were transported in isolated SPF containment to the Royal Veterinary College poultry barn, University of London, United Kingdom, where the primary infection with E. maxima W sporulated oocysts were conducted in floor pens.

#### Eimeria maxima Challenge Experiments Intercross Population

F2 intercross (n = 195), and 12 parental line birds were initially infected by oral gavage with 100 sporulated oocysts of E. maxima H at 25 days of age and moved to individual cages. Feces were collected from each bird on a daily basis during the 5–10 days post-challenge (pi) period following infection. Three weeks later (47 days of age) a secondary challenge was initiated by oral gavage of 250 sporulated oocysts of E. maxima W. Feces were again collected from each bird on a daily basis during the 5–10 day post-challenge period.

#### Backcross Population

At 21 days of age, chickens were inoculated by oral gavage with either 1 ml distilled water (control group, n = 20) or 100 sporulated oocysts of E. maxima W (infected group, n = 194). To avoid cross-infection the control group was housed separately. Birds were euthanised humanely at day 7 pi, coinciding with the peak pathological effects of E. maxima (Rothwell et al., 2004), providing the greatest sensitivity for parasite genome detection (Blake et al., 2006). A blood sample from each bird was collected post-mortem via aortic rupture into 1.5 ml Sigma-Aldrich (Dorset, United Kingdom) microcentrifuge tubes. Bijou tubes (7 ml SterilinTM) containing 5–10 volumes of room temperature RNAlater <sup>R</sup> (Life Technologies, Carlsbad, CA, United States) were used to store 5.0 cm of intestinal tissue and content from either side of Meckel's diverticulum.

#### Phenotyping

Individual oocyst output was used to study the outcome of the E. maxima H primary infection and secondary heterologous E. maxima W challenge in the intercross chicken population. Oocysts were quantified daily (5 to 10 days post- infection and challenge) using a microscope and saturated salt flotation in a McMaster counting chamber (Eckert et al., 1995; Smith et al., 2002). Daily totals were combined to provide a total count for oocyst output per bird for both the primary infection and secondary challenge. Oocyst counts were log-transformed to approximate normal distribution.

The phenotypes used to study resistance to E. maxima W primary infection in the backcross population were relative intestinal Eimeria genome copy number (PR, measured using quantitative PCR as parasite genomes per host chicken genome), intestinal LS (pathology, on a scale 0–4), and serum IL-10 level (IL-10). Quantitative real-time PCR targeting the E. maxima microneme protein 1 (EmMIC1) and Gallus gallus β-actin (actb) loci was performed using total genomic DNA extracted from a 10 cm length of intestinal tissue centered on Meckel's diverticulum using a DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany). Briefly, each complete tissue sample was disaggregated using a Qiagen TissueRuptor and an aliquot was processed for extraction of combined host and parasite DNA (see Blake et al., 2006, for full details). A CFX96 Touch <sup>R</sup> Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, United States), was used to amplify each sample in triplicate (Nolan et al., 2015), with an additional Bead-Beater homogenization step prior to buffer ATL treatment (including 1 volume 0.4–0.6 mm glass beads, 3,000 oscillations per minute for 1 min). Intestinal pathology was assessed by the same experienced operator scoring lesions according to Johnson and Reid (1970). A capture ELISA was used to measure IL-10, employing ROS-AV164 and biotinylated ROS-AV163 as capture and detection antibodies, respectively (see Wu et al., 2016, for full details). IL-10 levels and parasite genome numbers were log-transformed to approximate normal distribution.

#### Phenotypic Correlations

Following log-transformation for PR and IL-10, all backcross phenotypic traits were rescaled to modify the unit of measurement differences. Then, fitting host sex as a fixed effect in a multivariate linear model, phenotypic correlations (rP) were estimated using ASReml 4.1 (Gilmour et al., 2015).

#### Genome-Wide Association Studies

Sixty-seven F2 birds exhibiting the most extreme phenotypes, plus the 12 intercross parental line birds and the entire backcross generation were genotyped using the 600K Affymetrix <sup>R</sup> Axiom <sup>R</sup> HD genotyping array (Kranis et al., 2013). Although each data set was analyzed separately, the same GWAS steps were used for both populations. The marker genotype data were subjected to quality control measures using the thresholds: minor allele frequency < 0.02 and call rate > 90%. Deviation from Hardy–Weinberg equilibrium was not considered a reason for excluding markers since these were experimental populations of inbred lines. After quality control 203,845 intercross and 204,072 backcross markers remained and were used, respectively, to generate separate intercross and backcross genomic relationship matrixes (GRMs) to investigate the presence of population stratification. Next, each GRM was converted to a distance matrix that was analyzed with a classical multidimensional scaling using the GenABEL package of R (Aulchenko et al., 2007) to obtain principal components. These analyses revealed three principal components in the intercross population (one for each parental line and one for F2 birds), but no substructure in the backcross.

Boulton et al. Maxima Genomics

GWAS for each trait were then conducted using GenABEL based on a mixed model, with the population principal components fitted as a co-variate (intercross population only), sex fitted as a fixed effect in both studies, and GRM fitted as a random polygenic effect to adjust for population sub-structure. In the case of GWAS for heterologous secondary challenge response, the oocyst output following the first challenge was also fitted as a covariate to account for the effect of the first challenge. After Bonferroni correction for multiple testing, significance thresholds were P ≤ 2.45 × 10−<sup>7</sup> and P ≤ 4.90 × 10−<sup>6</sup> for genome-wide (P ≤ 0.05) and suggestive (namely one false positive per genome scan) significant levels, respectively corresponding to −log<sup>10</sup> (P) of 6.61 and 5.30. The extent of linkage disequilibrium (LD) between significant markers located on the same chromosome regions was calculated using the r-square statistic of PLINK v1.09 (Purcell et al., 2007).

Effects of the significant markers identified in each GWAS were re-estimated in ASReml 4.1 (Gilmour et al., 2015) by individually fitting the markers as fixed effects in the same model as used for GWAS analyses. Effects were calculated as follows: additive effect, a = (AA – BB)/2; dominance effect, d = AB- ((AA + BB)/2), where AA, BB, and AB were the predicted trait values for each genotype class.

All significant markers identified in GWAS for responses to primary infection and secondary E. maxima W challenge were mapped to the reference Gallus gallus domesticus genome and annotated using the variant effect predictor<sup>1</sup> tool within the Ensembl (genome browser 92) database and the Gal-gal5 assembly<sup>2</sup> . Furthermore, genes located within 100 kb up- and down-stream of the significant markers were annotated using the BioMart data mining tool<sup>3</sup> and the Gal-gal5 assembly. This method of annotation enabled all genes located in the vicinity of the identified significant markers to be identified and cataloged.

#### Re-sequencing Data Analysis

To identify possible protein-coding genes associated with the detected QTL, genomic sequences in the regions of interest from the line 15I and C.B12 chickens were compared. The two parental chicken lines were entirely re-sequenced at 15–20 fold coverage, using pools of 10 individuals per line, performed on an Illumina GAIIx platform using a paired-end protocol (Krämer et al., 2014). Re-sequencing data of the candidate regions (i.e., 1 kb up- and downstream of the candidate gene end sites), for resistance to primary infection and heterologous challenge derived from intercross and backcross analyses, were then extracted and examined separately. Using the Mpileup tool for marker calling (SAMtools v0.1.7; Li et al., 2009), single nucleotide variants (SNVs) between the two parental lines and the reference genome in these regions were detected. These were then annotated using the same variant effect predictor software as above. Information for all SNV [intergenic, intronic, exonic, splicing, 3<sup>0</sup> and 5 untranslated regions (3<sup>0</sup> UTR, 5<sup>0</sup> UTR)] present in the regions of interest were collated. Intergenic, intronic, and exonic synonymous variants were then filtered out along with SNV that were common in the two parental lines but different from the reference genome. Thus, only sites that were different between the parental lines and had an effect on the coding sequence (nonsense, missense, splicing) or a potential effect on the gene expression (3<sup>0</sup> UTR and 5<sup>0</sup> UTR) were retained for further study.

#### Pathway, Network, and Functional Enrichment Analyses

Identification of potential canonical pathways and networks underlying the candidate genomic regions associated with outcomes of primary infection and heterologous secondary E. maxima challenge were performed using the ingenuity pathway analysis (IPA) program<sup>4</sup> . IPA constructs multiple possible upstream regulators, pathways, and networks that serve as hypotheses for the biological mechanism underlying the phenotypes based on a large-scale causal network derived from the Ingenuity Knowledge Base. After correcting for a baseline threshold and calculating statistical significance, the most likely pathways involved are inferred (Krämer et al., 2014). The constructed networks can then be ranked using their IPA score based on the P-values obtained using Fisher's exact test [IPA score or P-score = −log<sup>10</sup> (P-value)].

The gene lists for each phenotype were also analyzed using the Database for Annotation, Visualization and Integrated Discovery (DAVID; Dennis et al., 2003). To understand the biological meaning behind these genes, gene ontology (GO) was determined, and functional annotation clustering analysis was performed using the integral G. gallus background. The enrichment score (ES) of DAVID is a modified Fisher exact P-value calculated by the software, with higher ES reflecting more enriched clusters. An ES > 1 means that the functional category is overrepresented.

#### RESULTS

#### Descriptive Statistics

Phenotypic distributions for oocyst counts following primary infection with E. maxima H and secondary challenge with E. maxima W in the intercross and parental populations along with relative DNA and IL-10 levels in the backcross populations after primary infection with E. maxima W are presented in **Figures 1A–C**. After primary infection the pure line C.B12 birds produced fewer E. maxima oocyst counts compared to the pure line 15I and F2 birds, with the highest oocyst output recorded in the pure line 15I group. Conversely, inverse findings regarding oocyst output were recorded in the two parental lines following heterologous secondary strain challenge. These results agree with previous findings that show line C.B12 birds develop no cross protection between primary H and secondary W strain challenges, while line 15I birds develop significant cross-protection when infected in this order (Smith et al., 2002; Blake et al., 2005). As expected, for both primary and

<sup>1</sup>http://www.ensembl.org/Tools/VEP

<sup>2</sup>https://www.ncbi.nlm.nih.gov/assembly/GCF\_000002315.4/

<sup>3</sup>http://www.ensembl.org/biomart/martview/

<sup>4</sup>www.ingenuity.com

secondary challenges F2 intercross population oocyst count level was intermediate between those of the two parental lines.

Among the backcross chickens, following infection with E. maxima W, phenotypic scores for intestinal lesions were low (0–2), however significant variance (P = 0.05) was noted (**Table 1**). Estimated phenotypic correlations between the three measured traits ranged from 0.8 to 0.15, with only the correlation between LS and IL-10 being statistically significant (rLS,IL−<sup>10</sup> = 0.15 ± 0.07; **Figure 1D** and **Table 1**).

#### Genome-Wide Association Studies Intercross Study

Genome-wide association study analysis for oocyst output following primary infection of the intercross population with E. maxima H did not reveal significant associations after



Significant values are highlighted in italics. Covariances are presented below the diagonal, variances are shaded on the diagonal, with between-trait correlations above. Measured traits are parasite replication per host genome (PR), lesion score (LS), and serum interleukin-10 (IL-10).

the strict Bonferroni correction. However, an association with markers on chromosome 2, just below the suggestive threshold was reported (results not shown). GWAS analysis following secondary challenge with the heterologous E. maxima W strain identified 11 markers on chromosome 1, all having suggestive associations with the trait in the intercross population. These 11 markers belonged to the same LD block (499 bp, r<sup>2</sup> = 1; **Figure 2** and **Table 2**). The corresponding Q–Q plot for the GWAS intercross result is found in **Figure 2**.

The 11 significant markers associated with the outcome of secondary challenge by the heterologous E. maxima strain were all located in intronic, upstream, and downstream regions of the phenylalanine hydroxylase (PAH) gene (**Supplementary Table S1**). In the 0.5 Mb candidate region for enhanced response to heterologous secondary E. maxima challenge only 16 protein coding genes were located (**Supplementary Table S2**).

#### Backcross Study

Genome-wide association study results for resistance to E. maxima W primary infection in the backcross population revealed several of significant genomic associations for each of the measured phenotypes. However, there was no overlap of the candidate genomic regions linked to parasite reproduction, intestinal pathology, or IL-10 induction (**Figure 3** and **Table 3**). Specifically, a single marker on chromosome 3 had a suggestive association with PR (**Figure 3A** and **Table 3**). Four suggestive marker associations were identified with markers on chromosomes 1, 2, and 3 for intestinal pathology (i.e., lesion

damage; **Figure 3B** and **Table 3**). A further four associations were found for IL-10 on chromosomes 1, 2, and 5 (**Figure 3C** and **Table 3**). None of the markers found on chromosome 2 for

TABLE 2 | Details of GWAS-identified and animal model-verified significant markers for oocyst output (OoC) from the intercross chickens following secondary challenge with the heterologous E. maxima W strain.


Details provided: Affymetrix marker identifier; chromosome and position of markers in the Galgal5 assembly (Chr:mb); additive genetic effects (GA), with significance values (P-value).

LS and IL-10 were in common, nor were they in LD. However, the candidate QTL region for IL-10 on chromosome 2 was in proximity with an intercross marker found following primary infection with E. maxima H in the intercross population that falls below the suggestive threshold. The corresponding Q–Q plots for GWAS are displayed in **Figure 4**. All significant markers identified in both studies exhibited significant (P < 0.01) additive genetic effects (**Table 3**).

All of the significant markers identified for resistance to primary E. maxima W infection in the backcross population were located in intronic or intergenic regions (**Supplementary Table S3**). The candidate regions for response to primary E. maxima W infection contains a small number of genes: 36 protein-coding genes and four microRNAs (**Supplementary Table S4**).

#### Resequencing Analysis

In total, 3,230 variants were identified in the candidate regions associated with resistance to primary E. maxima infections. SNV located in exonic regions accounted for less than 3% of the total, while the remaining SNV (97%) were located in intronic, upstream, and downstream regions. Genes with SNVs that could potentially lead to non-functional transcripts

were not detected. However, six genes contained missense SNVs that may affect the function of the encoded proteins. More specifically, LONRF2, CHST10, PDCL3, and TBC1D8 genes on chromosome 1, FAM69C on chromosome 2, and IPCEF1 on chromosome 3 had missense with moderate effect SNVs. Also, these genes contained 3<sup>0</sup> /5<sup>0</sup> UTR variants that may affect the expression of these genes. Details of the missense variants identified in the candidate regions for E. maxima resistance to primary infection are presented in **Supplementary Table S5**.

In total, 2,165 SNV were detected in the candidate region on chromosome 1 for the response to heterologous secondary E. maxima W challenge. Most of the identified SNV (95%) were located in intronic, upstream and downstream regions; 5% were located in exonic regions, mostly in 3<sup>0</sup> and 5<sup>0</sup> UTR regions.

TABLE 3 | Details of GWAS-identified and animal model-verified significant markers from the backcross chicken response to E. maxima primary infection.


Measured traits – parasite replication per host genome (PR), Lesion Score (LS), and serum interleukin-10 (IL-10). Details provided: Affymetrix marker identifier; chromosome and position of markers in the Gal-gal5 assembly (Chr:mb); the additive genetic effect (GA) and significance values (P-value).

Nevertheless, three genes (PMCH, TBXAS1, THL3) containing missense variants with moderate effects as well as 3<sup>0</sup> /50UTR variants were detected. Details of the missense variants identified in the candidate regions for heterologous secondary E. maxima W challenge are presented in **Supplementary Table S6**.

#### Pathway, Network, and Functional Enrichment Analyses

The analyses for resistance to primary E. maxima infection revealed pathway enrichment for immune response involvement, including IL-10, interleukin-6 (IL-6), nuclear factor kappa-lightchain-enhancer of activated B cells (NF-κb) and toll like receptor signaling (**Figure 5**). Using the list of candidate region genes, two networks were constructed, comprising molecular interactions related to inflammatory response and disease, cell death and survival, cellular compromise, and cell cycle (IPA scores = 25; **Figures 6A,B**). A single enriched cluster was found, related to immune response linked to interleukin-1 (IL-1), Toll/IL1 response and cytokine-cytokine receptor response (ES = 2.2, with IL1R1, IL1RL1, IL2R, IL19R18, PTPRM, and COL14A genes involved).

The pathway analyses for response to heterologous E. maxima W strain secondary challenge revealed enrichment for both immune (prostanoid biosynthesis, retinoic acid mediated apoptosis signaling, eicosanoid signaling) and metabolic pathways (**Figure 7**). Two gene networks were constructed, related to cell signaling, nucleic acid metabolism and small molecule biochemistry (IPA score = 20), and cellular development, tissue development and function (IPA score = 45), respectively (**Figures 8A,B**). Accompanying functional annotation clustering analysis revealed the presence of two enriched clusters related to cell to cell signaling (ES = 1.7) and metal-ion binding (ES = 1.3).

#### DISCUSSION

Coccidiosis remains one of the costliest diseases for the international poultry industry. Selectively breeding chickens for enhanced resistance to Eimeria challenge, and for improved breadth of vaccine response, could provide a tractable strategy to improve coccidiosis control. We conducted two studies using different crosses between the White Leghorn inbred lines 15I and C.B12. Our data confirm that line 15I birds are more susceptible to primary infection with E. maxima than line C.B12 by overall PR (Smith et al., 2002; Blake et al., 2006). While the two inbred lines exhibit similar resistance/susceptibility profiles following primary infection with either of the two antigenically distinct E. maxima strains, they show radically different levels of protection against heterologous secondary challenge by antigenically distinct strains of the same pathogen (Smith et al., 2002). We therefore investigated the genetic background of resistance to primary and heterologous secondary E. maxima W challenges.

The resistance of chickens to Eimeria infection has traditionally been quantified using measures such as oocyst output and LS, indicating resistance to PR and parasiteinduced pathology, respectively. For the former, the fewer oocysts excreted, the more resistant the chicken. Thus, oocyst shedding is considered to be an indicative trait and an accurate phenotype for calculating resistance to primary infection and subsequent parasite challenges and this method was used in the intercross experiment. However, calculation of oocyst

FIGURE 6 | Molecular interaction networks constructed from the canonical pathways identified in the backcross infection response relate to (A) inflammatory response and disease and (B) cell death and survival, cellular compromise and cell cycle.

output by fecal flotation and microscopy is labor intensive. Thus, quantitative real-time PCR for parasite genome copies in intestinal tissues was used as an alternative measure of PR in the more recent backcross experiment (Blake et al., 2006). A third trait, serum IL-10, was also quantified for these latter chickens, providing a measure of the innate immune response to Eimeria infection (Rothwell et al., 2004; Boulton et al., 2018). IL-10 is produced after E. maxima and E. tenella primary infection of White Leghorn chickens (lines 15I and C.B12) and E. tenella primary infection of commercial broilers (Rothwell et al., 2004; Wu et al., 2016; Boulton et al., 2018). In all these cases, IL-10 was expressed at high levels in infected birds only, and significantly correlated with pathology (lesion scores). Here, GWAS from the backcross experiment identified markers associated with IL-10 that exhibit significant additive genetic variance. These findings, in conjunction with

indications that IL-10 is correlated significantly with gross pathology in a commercial population primary infection with E. tenella (Boulton et al., 2018), support the use of IL-10 as an accessible early-life biomarker in breeding programs aiming to enhance Eimeria resistance to challenge or pathological outcomes.

Although the significance of E. maxima in field coccidiosis has been recognized for many years, there has been a limited number of genetic studies investigating host resistance to E. maxima primary infection and challenge. A recent study that investigated the genetic background of resistance to high-level E. maxima infection using the same HD genotyping array but measuring three different phenotypes (body weight gain, plasma coloration, and β2-globulin in blood plasma) identified several QTL on chromosomes 1, 2, 3, 5, and 10 in commercial Cobb500 broilers (Hamzic et al., 2015). Similar to our findings, Hamzic et al. (2015) found no QTL overlap among their different phenotypes. Interestingly, QTL identified by Hamzic et al. (2015) on chromosome 1 for β2-globulin in blood plasma is nearby (2 Mb difference) QTL found in our study linked to for resistance to heterologous secondary E. maxima W challenge. Similar enriched biological pathways related to innate immune responses and metabolic processes were also detected in the two studies with this parasite species.

In other comparable work, Zhu et al. (2003) performed a linkage analysis study investigating chicken resistance in terms of oocyst output following controlled E. maxima infection using an F2-intercross between two broiler lines with different susceptibility to primary E. maxima infection. Using 119 microsatellite markers one locus associated with E. maxima resistance was identified on chromosome 1 (Zhu et al., 2003). Expanding this work, Kim et al. (2006) used nine microsatellite markers located on chromosome 1 to refine this region. According to their results, the peak of QTL was located a considerable genetic distance (i.e., 254 cM) away from the chromosome 1 QTL identified here and in the Hamzic et al. (2015) study. This could be attributed to the use of different chicken lines, E. maxima strains, analysis methods, and/or genotyping tools. It is worth mentioning that the power to detect QTL as well as the resolution of their location using a few microsatellites is limited compared to HD genotyping platforms.

Comparison of the re-sequencing data of the two parental chicken lines identified a small number of genes that differ regarding the presence of exonic variants with a putative functional effect on the encoded proteins. Two genes of interest with missense variants located in the candidate regions for resistance to E. maxima primary infection encode Phosducin Like 3 (PDCL3) and TBC1 Domain Family Member 8 (TBC1D8) proteins. These immune-related genes were included in the two networks related to inflammatory response, and cell death and survival, constructed by IPA. PDCL3 acts as a chaperone for the angiogenic vascular endothelial growth factor receptor, controlling its abundance and inhibiting its ubiquitination and degradation, and also modulating activation of caspases during apoptosis (Wilkinson et al., 2004; Srinivasan et al., 2013). TBC1D8 is involved in the regulation of cell proliferation, calcium ion transportation, and also has GTPase activator activity (Ishibashi et al., 2009).

The genes encoding Thromboxane A Synthase 1 (TBXAS1) and Pro-Melanin Concentrating Hormone (PMCH) are located in the candidate region and are of interest in resistance to secondary challenge by heterologous E. maxima W. TBXAS1 encodes a member of the cytochrome P450 superfamily of enzymes involved in both immune response and metabolism; it plays a role in drug metabolism, platelet activation and

metabolism, and synthesis of cholesterol, steroids, and other lipids (Yokoyama et al., 1991; Miyata et al., 1994). The proinflammatory actions of thromboxane receptors have been demonstrated to enhance cellular immune responses in a mouse model (Thomas et al., 2003). PMCH encodes a preproprotein that is proteolytically processed to generate multiple protein products, including melanin-concentrating hormone (MCH) that stimulates hunger and may additionally regulate energy homeostasis, reproductive function, and sleep (Viale et al., 1997; Chagnon et al., 2007). In a further mouse model, MCH has also been reported as a mediator of intestinal inflammation (Kokkotou et al., 2008). Although, the genes mentioned above are good functional candidates for resistance to primary infection and heterologous challenge with E. maxima, further studies are needed to confirm the present results and identify the actual causative genes and mutations.

The immune interactions between an intracellular pathogen and a host are complex and vary as a consequence of the survival mechanisms that have evolved in both (Blake et al., 2011; Blake and Tomley, 2014). It has been suggested that host control of challenge with Eimeria, an obligate intracellular pathogen, requires a strong inflammatory, mostly cell mediated response (Shirley et al., 2005; Dalloul and Lillehoj, 2006). Also, host innate immune responses have been detected during initial pathogen exposure in several studies (Kim et al., 2008; Pinard-van der Laan et al., 2009; Wu et al., 2016; Boulton et al., 2018). According to our findings, several gene networks and pathways relating to innate, humoral and cell-mediated, immune responses were highlighted from the gene products located in the candidate regions for resistance to primary Eimeria infection. Among the canonical pathways, IL-10 signaling was the most significant, with relevance as a regulator of cytokines such as interferon- (IFN-) γ. These findings agree with previous studies of Eimeria resistance that have highlighted IFN γ and tumor necrosis factor (TNF) nodes as crucial (Pinard-Van Der Laan et al., 1998; Smith and Hayday, 2000a,b; Bacciu et al., 2014), since IL-10 downregulates IFNγ production (Schaefer et al., 2009).

#### CONCLUSION

We identified genomic regions, putative candidate genes, canonical pathways and networks involved in the underlying molecular mechanisms of chicken resistance to E. maxima primary infection and to secondary heterologous E. maxima strain challenge. More emphasis should be placed on the relevant mechanisms for disease resistance, the response to secondary heterologous strain challenge and the role of IL-10 induction in immune responses to intestinal challenge in the future selective breeding of chickens.

#### REFERENCES

Aulchenko, Y. S., Ripke, S., Isaacs, A., and Van Duijn, C. M. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296. doi: 10.1093/bioinformatics/btm108

## AVAILABILITY OF SUPPORTING DATA

The resequencing data used in this study is available in NCBI dbSNP at the following web page: http://www.ncbi.nlm.nih.gov/ SNP/snp\_viewBatch.cgi?sbid=1062063.

#### AUTHOR CONTRIBUTIONS

AS, PK, SB, FT, and DB devised the overall strategy and obtained funding. PK, SB, FT, and DB conceived the backcross experiments. PMH and KB devised the backcross breeding. MN managed the backcross trials and performed qPCR and DNA extraction assisted by KH and KB. Backcross phenotype collection was carried out by MN, KH and KB, while DB scored lesions. ZW performed IL-10 assays assisted by KB. KB prepared backcross DNA for genotyping and carried out all backcross analyses with input from AP, VR, and OM. AS designed the intercross trials with input from NB and these were carried out by PH and AA. AP performed an initial analysis of the intercross data with input from OM and KB. Pathway and resequencing analyses were performed by AP and KB. The manuscript was drafted by KB and AP with input from all other authors except PMH, SB, NB, and PK. AS, DB, FT, DH, and AP assisted in the interpretation of results.

### FUNDING

The backcross work was funded by the BBSRC through the Animal Research Club (ARC) program under grants BB/L004046 and BB/L004003, while DEFRA OD0534 and BBSRC BB/E01089X/1 funded the intercross study.

#### ACKNOWLEDGMENTS

We thank our colleagues who assisted in the collection of phenotypes, including Iván Pastor-Fernández, Lucy Freem, Angela Stebbings and Nigel Salmon, and the staff at the NARF Bumstead, RVC animal welfare barn and Institute for Animal Health (now Pirbright Institute), Compton facilities. We also thank Richard Kuo for providing the re-sequencing information of the two inbred chicken lines.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00528/full#supplementary-material

Bacciu, N., Bed'hom, B., Filangi, O., Rome, H., Gourichon, D., Reperant, J. M., et al. (2014). QTL detection for coccidiosis (Eimeria tenella) resistance in a fayoumi x leghorn F<sup>2</sup> cross, using a mediumdensity SNP panel. Genet. Sel. Evol. 46:14. doi: 10.1186/1297-9686- 46-14



**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with several of the authors, NB, PH, AA, and AS at time of review.

Copyright © 2018 Boulton, Nolan, Wu, Riggio, Matika, Harman, Hocking, Bumstead, Hesketh, Archer, Bishop, Kaiser, Tomley, Hume, Smith, Blake and Psifidi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Novel Resilience Phenotypes Using Feed Intake Data From a Natural Disease Challenge Model in Wean-to-Finish Pigs

Austin M. Putz <sup>1</sup> , John C. S. Harding<sup>2</sup> , Michael K. Dyck <sup>3</sup> , F. Fortin<sup>4</sup> , Graham S. Plastow<sup>3</sup> , Jack C. M. Dekkers <sup>1</sup> \* and PigGen Canada†

<sup>1</sup> Department of Animal Science, Iowa State University, Ames, IA, United States, <sup>2</sup> Department of Large Animal Clinical Sciences, University of Saskatchewan, Saskatoon, SK, Canada, <sup>3</sup> Department of Agriculture, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada, <sup>4</sup> Centre de Développement du Porc du Québec Inc. (CDPQ), Québec City, QC, Canada

#### Edited by:

Andrea B. Doeschl-Wilson, University of Edinburgh, United Kingdom

#### Reviewed by:

Ilias Kyriazakis, Newcastle University, United Kingdom Allan Schinckel, Purdue University, United States Susanne Hermesch, University of New England, Australia

> \*Correspondence: Jack C. M. Dekkers jdekkers@iastate.edu

†PigGen Canada authors are listed at the end of the article

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 24 April 2018 Accepted: 04 December 2018 Published: 08 January 2019

#### Citation:

Putz AM, Harding JCS, Dyck MK, Fortin F, Plastow GS, Dekkers JCM and PigGen Canada (2019) Novel Resilience Phenotypes Using Feed Intake Data From a Natural Disease Challenge Model in Wean-to-Finish Pigs. Front. Genet. 9:660. doi: 10.3389/fgene.2018.00660 The objective of this study was to extract novel phenotypes related to disease resilience using daily feed intake data from growing pigs under a multifactorial natural disease challenge that was designed to mimic a commercial environment with high disease pressure to maximize expression of resilience. Data used were the first 1,341 crossbred wean-to-finish pigs from a research facility in Québec, Canada. The natural challenge was established under careful veterinary oversight by seeding the facility with diseased pigs from local health-challenged farms, targeting various viral and bacterial diseases, and maintaining disease pressure by entering batches of 60–75 pigs in a continuous flow system. Feed intake (FI) is sensitive to disease, as pigs tend to eat less when they become ill. Four phenotypes were extracted from the individual daily FI data during finishing as novel measures of resilience. The first two were daily variability in FI or FI duration, quantified by the root mean square error (RMSE) from the within individual regressions of FI or duration at the feeder (DUR) on age (RMSEFI and RMSEDUR). The other two were the proportion of off-feed days, classified based on negative residuals from a 5% quantile regression (QR) of daily feed intake or duration data on age across all pigs (QRFI and QRDUR). Mortality and treatment rate had a heritability of 0.13 (±0.05) and 0.29 (±0.07), respectively. Heritability estimates for RMSEFI, RMSEDUR, QRFI, and QRDUR were 0.21 (±0.07) 0.26 (±0.07), 0.15 (±0.06), and 0.23 (±0.07), respectively. Genetic correlations of RMSE and QR measures with mortality and treatment rate ranged from 0.37 to 0.85, with QR measures having stronger correlations with both. Estimates of genetic correlations of RMSE measures with production traits were typically low, but often favorable (e.g., −0.31 between RMSEFI and finishing ADG). Although disease resilience was our target, fluctuations in FI and duration can be caused by many factors other than disease and should be viewed as overall indicators of general resilience to a variety of stressors. In conclusion, daily variation in FI or duration at the feeder can be used as heritable measures of resilience.

Keywords: resilience, disease resistance, feed intake, feeding duration, pigs

## INTRODUCTION

Disease resilience can be defined as the ability to maintain relatively undiminished performance levels under infection (Albers et al., 1987; Doeschl-Wilson et al., 2012; Mulder and Rashidi, 2017). In the literature, much focus has been placed on separating disease resistance and tolerance (Bishop, 2012; Bishop and Woolliams, 2014; Lough et al., 2015). Disease resilience is an alternative to selection for a combination of resistance and tolerance (Guy et al., 2012; Mulder and Rashidi, 2017). Most studies on resilience (e.g., Mulder and Rashidi, 2017), however, consider only a single disease but an animal could be resistant or tolerant to one disease and more susceptible to other diseases. Currently, there are dozens of pathogens for swine around the world, including viral, bacterial, and parasitic infectious diseases (Zimmerman et al., 2012). Pathogens can be spread around the world. New pathogens and alternative strains will continue to develop as well. Breeding companies that market breeding stock across the globe have to simultaneously consider disease resilience to many of these pathogens and environments. Selecting animals that maintain performance in a typical commercial system provides a natural weighting of resilience to each disease based on the impact of each disease on productivity, along with the incidence or prevalence of the disease. van der Waaij et al. (2000) stated that observed production can be viewed as a selection index where the underlying components are weighted based on their impacts on performance. It is important, however, that the testing environment is representative of the target commercial environments. Resilience can be an effective, but "black-box" approach to selection for disease resistance and tolerance in animals (Mulder and Rashidi, 2017). One of the challenges, however, is to obtain heritable measures or indicators of resilience for selection, as elite breeding populations are typically kept in high-health conditions.

Recently, Elgersma et al. (2018) exploited routinely collected daily milk yield to quantify resilience in lactating dairy cows because daily milk yield is sensitive to diseases such as mastitis. Both significant drops in milk yield and day-to-day variation in milk yield within cow were used to quantify resilience. These phenotypes did not quantify disease resilience specifically, as it was not possible to validate that all changes in milk yield were related to infectious diseases. This becomes a multifactorial issue as causes for drops in milk yield can include mastitis, lameness, subclinical ketosis, and displaced abomasum, among others (King et al., 2018). This leads to these types of phenotypes capturing disease resilience along with general resilience (Elgersma et al., 2018). When selection for growth under a high stress environment was practiced in cattle, Frisch (1981) found that the selected animals were more productive under challenge but that this selection did not change their growth potential. If the goal is to only target disease resilience, this is a disadvantage for measuring production or deviations in production. For instance, in dairy cattle, using somatic cell count as an indicator trait may be better for selection against only mastitis than measuring productivity fluctuations in milk yield or feed intake. However, if the breeding objective is to maintain productivity regardless of the causes associated with milk yield deviations (i.e., general resilience), phenotypes that measure changes in productivity over time within animal are likely to have an economic value themselves (Elgersma et al., 2018).

Much is known about the relationship between feed intake (FI) and anorexia (Sandberg et al., 2006; Kyriazakis and Doeschl-Wilson, 2009). Production of cytokines such as interleukin-6 (IL-6) and tumor necrosis factor-alpha α (TNF-α) can cause a loss of appetite (Webel et al., 1997; Petry et al., 2007; Kyriazakis and Doeschl-Wilson, 2009). Knap (2009) suggested that individual day-to-day variation in feed intake could be utilized to quantify environmental sensitivity such as resilience to heat stress. Animals with more day-to-day variation in FI would indicate animals that are less resilient. Under a disease challenge, day-to-day variation in FI would reflect resilience to disease.

Alternative feed intake traits from individual FI electronic systems have been analyzed previously for the purpose of developing indicator traits for feed intake or feed efficiency in a selection index (de Haer et al., 1993; Von Felde et al., 1996; Schulze et al., 2003; Young et al., 2011; Lu et al., 2017). The most common and simplest of these traits investigated are occupation time at the feeder (or duration), number of visits, and FI rate (kg feed / unit time). Other feeding traits during the course of a day have also been investigated (Kyriazakis and Tolkamp, 2018). Individual FI is typically recorded in high-health environments, which limits the use of these data in nucleus herds to quantify traits related to environmental sensitivity or resilience (mostly due to health). FI traits such as feeding duration (i.e., time at the feeder) could also exhibit day-to-day variability from causes such as illness and may be a more feasible alternative to collecting individual FI in these challenged environments if typical commercial feeders could be enhanced with antennae to collect time at the feeder on individual pigs (with RFID tags). Feeding traits, such as duration, become more valuable in severely challenged environments due to the fact that if a pig stops eating completely their time at the feeder is expected to be zero.

The objectives of this study were to (1) develop and evaluate novel measures of resilience based on daily feed intake and feeding duration data for finishing pigs in a healthchallenged environment and (2) determine heritabilities and genetic correlations of these measures with mortality, treatments, and other economically important production traits.

#### MATERIALS AND METHODS

This study was carried out in accordance with the recommendations of the Canadian Council on Animal Care (https://www.ccac.ca/en/certification/about-certification/).

The protocol was approved by the Protection Committee of the Centre de Recherche en Sciences Animales de Deschambault (CRSAD; http://www.crsad.qc.ca/). The Centre de développement du porc du Québec (CDPQ) had full oversight on the project along with veterinarians.

#### Natural Challenge Protocol

A natural challenge wean-to-finish protocol was established in late 2015 at CDPQ in Québec, Canada, with the aim to mimic a commercial farm with high disease pressure to maximize expression of genetic differences in resilience. The protocol was established at a research facility to allow detailed phenotype recording, blood sampling, and in vivo assays. This is an ongoing project that will conclude in early 2019. The natural challenge facility consists of three consecutive phases: (1) a healthy quarantine nursery for ∼19 days after weaning, (2) a late nursery phase, where pigs are first exposed to disease for ∼4 weeks, and (3) a finishing phase for the remainder of the growing period (69–181 days of age on average). Phases 2 and 3 are in the same barn, connected by a hallway and are collectively referred to as the "challenge facility." Phase 1 is at a nursery approximately 1 km south of the challenge facility and is kept free of disease using strict biosecurity between the facilities. In the quarantine nursery, samples, and measurements are taken for future development of early predictors of resilience in a non-challenged environment, typical of a genetic nucleus. The number of pigs per pen is approximately four, seven, and thirteen for phases one to three, respectively. The quarantine nursery was not available for cycle 1 (first seven batches), for which phases 1 and 2 were combined. During this period, strict biosecurity was practiced between the nursery and finishing unit (same building connected by a hallway) but this was not sufficient to keep diseases from getting into the nursery, after which the quarantine nursery was established.

The natural disease challenge was established by bringing in naturally infected animals (seeder pigs) from strategically selected farms into the challenge barn (late nursery and finishing). Four groups of 12–28 pigs were introduced from three different commercial farms in the first four months of the study as seeder pigs. Thereafter, monitoring for diseases was focused on the test population and less on the seeder pigs. Initially, the targeted diseases included porcine reproductive and respiratory syndrome virus (PRRSV), porcine circovirus type 2 (PCV2), Mycoplasma hyopneumoniae (M. hyo.), Actinobacillus pleuropneumonia (APP), and swine influenza, and various opportunistic bacterial pathogens, including Streptococcus suis and Haemophilus parasuis. APP strain 12 was present. Three different strains of PRRSV present had ∼85–90% sequence identity to the PRRS-MLV (Boehringer Ingelheim, St. Joseph, MO). Every batch was confirmed to have been exposed to PRRSV based on sampling a subset of individuals using PCR and serology four- and six-weeks post challenge, respectively. Multiple influenza subtypes were present in the barn including the H1N1 and H3N2 based on serological testing of a subset of the population at 18 weeks post entry. No typing for PCV2 or M. hyo was completed. The disease challenge was a function of these pathogens collectively in combination with the environment, management, and veterinary strategies designed to obtain a target infection pressure for each batch. The natural challenge was set up as a continuous flow system in order to maintain a steady health challenge without having to keep introducing pathogens, as well as for labor and flow considerations. A new batch of naïve pigs enters every three weeks and is generally provided fenceline contact with the preceding batch for ∼1-week period, except during periods of excessively high infection pressure when it is discontinued to help reduce mortality rate to sustainable levels established by the Animal Protection Committee. For the data used in this study, the following viruses were identified in the challenge facility: PRRSV (3 strains), Influenza A virus of swine (AIV; 2 strains), porcine circovirus type-2 (PCV2), and porcine rotavirus A (RVA). Bacterial pathogens diagnosed included: Actinobacillus pleuropneumoniae (APP), M. hyo., Streptoccus suis, Haemophilus parasuis, Brachyspira hampsonii, Salmonella sp., Cystoisospora suis (Coccidiosis), Ascaris suum, Erysipelothrix rhusiopathiae, and Staphylococcus hyicus (causative agent for Exudative Epidermititis). Not all pathogens were identified in all batches, as would be the case on a commercial farm and other unidentified minor pathogens may also have been present. Although fairly endemic in the US, porcine epidemic diarrhea (PED) was not present in Québec and was therefore not present in the challenge facility.

To maintain acceptable levels of animal welfare and morbidity, individual treatments were given on a case-by-case basis, along with periodic batch-level (or mass) treatments. The treatment protocol was established by the consulting veterinarian, who is licensed in the province of Québec, Canada. Veterinarians had close oversight on the treatment protocol over time, which was adapted as needed to maintain acceptable levels of disease and minimize animal suffering. In addition, some treatment decisions were made by multiple veterinarians and trained barn staff, introducing some level of subjectivity, as would be the case in a commercial facility. Pigs exhibited clinical signs indicative of pneumonia, diarrhea, lameness, arthritis, meningitis, dermatitis, pallor, lethargy, weight loss, unthriftiness, cyanosis, or conjunctivitis. Pigs were treated with one of ten different antibiotics as per a regimented treatment protocol outlining primary and secondary (if needed) treatment choices for each ailment. For some clinical signs, one of two antiinflammatory drugs were also administered. Batch-level water medication was used in the nursery when deemed necessary during periods of severe illness. One of two antibiotics were used in these batches. Furthermore, a water-soluble anti-inflammatory drug was also periodically administered in the nursery to treat batches that suffered from severe respiratory disease (primarily related to PRRSV infection). After the first seven batches, vaccination for PCV2 was added to the quarantine protocol in response to necropsy data linking characteristic lymphoid lesions with the presence of the virus. Reports from feed intake recording were generated daily for farm staff to identify sick pigs that did not eat as much as expected. Euthanasia decisions for animal welfare reasons were made by farm staff, with appropriate veterinary oversight. Barn air and temperatures were controlled with a ventilation system and a heater was used to regulate the lower bound temperatures within the barn.

A new batch of pigs entered the natural challenge protocol every three weeks. Each batch consisted of ∼60 or ∼75 weaned Large White by Landrace (or reciprocal mating) barrows (castrated male pigs) that were provided by one of the seven members of PigGen Canada (https://piggencanada.org/) from healthy multiplier farms. Each batch was sourced from one multiplier, but over time different multipliers could supply pigs for a given PigGen member. Variables collected on piglets at the multiplier farms were date of birth, wean age, and biological

sow ID. The protocol specified that two to four weaned barrows should be sampled per litter. Eighty-seven percent of all piglets met that criterion. Piglets were retagged with a sequential ID tag when they arrived at the first nursery. Every seven batches were considered a cycle, numbered one to three in the current study. Each company was represented once each cycle (i.e., one batch per company per cycle). This continued for a total of three cycles, therefore each company was represented three times in the data analyzed here. This came to a total of 1,341 pigs that entered the facilities within the time period studied.

A fixed weight system was used to identify pigs for slaughter, starting at ∼180 days of age. Pigs that were not heavy enough were delayed for three weeks and then evaluated again. Most batches took between two to four slaughter groups to slaughter all pigs from a batch. **Figure 1** shows a timeline of all batches analyzed in this study, with timing of date of birth, entry into the first nursery, entry into the finisher, and slaughter dates.

#### Data

All data and samples were collected by trained research staff from CDPQ following established protocols. Body weights were taken approximately every three weeks. However, if a pig was unhealthy, it may have been weighed a few more times, closer together in time. To obtain daily weights, a LOESS (Locally Weighted Scatterplot Smoothing) regression was fit to all weights available for an animal, using the loess function in R using defaults (R core team, 2017). LOESS regression is a form of nonparametric regression, also known as local regression, that can fit non-linear trends in a flexible enough manner to "connect the dots" between weight measurements. The correlation of LOESS predicted with observed weights was 0.9995 for days with an observed weight. The LOESS predicted daily weights were utilized for calculations of production measures such as feed efficiency and growth (see below).

Feed intake data was recorded in the finishing phase using IVOG <sup>R</sup> feeding stations (Insentec, Marknesse, Netherlands). Feed was available ad libitum throughout the study. The nursery feeding protocol consisted of four diet phases, while the finishing period included two diet phases. Individual feed intake visits were processed and cleaned by CDPQ staff using the methods of Casey et al. (2005) and were aggregated into daily totals for each pig, including total amount of feed consumed (kg) and duration (time) at the feeder (minutes). Daily totals of more than 5 kg of feed were set to missing. Missing daily values were subsequently imputed using a 5-day rolling average within animal (also used if there were two adjacent days missing).

**Figure 2** shows the distribution of death age for pigs that died prior to slaughter (344 or 26% of the 1,341 total animals). All treatment and mortality events and reasons were recorded (assigned by CDPQ research staff). Main treatment reasons included respiratory distress (thumping), gray/brown scours, coughing, lameness, yellow scours, arthritis, and failure to thrive/poor/skinny/hairy. Main mortality reasons included failure to thrive/poor/skinny/hairy, thumping/heavy breathing, sudden death, meningitis, and lameness/arthritis. Only individual treatments were included in the analyses and batch treatments were removed. Virtually all treatment reasons and ∼89% of the mortality reasons were disease-related.

#### Traits

Traits used for validation of the resilience traits developed herein included mortality (binary 0/1, 1 = died), number of treatments (**TRT**), and number of treatments per 180 days (**TRT180**). Number of treatments was a count of the number of individual treatments received by a pig. An individual treatment included any drug injection into an individual animal. Group treatments applied to batches were not included, as these would be accounted for in the model by the fixed effect of batch anyway. Only pigs that survived to slaughter received a phenotype for TRT. TRT180 was the number of treatments standardized to 180 days and was computed for animals that reached 65 days of age

(approximate age of entry into the finishing unit). For instance, if an animal received three individual treatments and died on day 80, the animal's adjusted TRT180 was (3/80)<sup>∗</sup> 180 = 6.75. This was to standardize treatment rate to approximately the same scale as TRT and to be interpretable from a practical standpoint (number of treatments to slaughter).

Two sets of resilience traits were derived from the daily FI data available for each pig. The first set of traits were derived as the root mean square error (**RMSE**) within animal from the regression of feed intake (FI) or duration (DUR) on age (**RMSE**FI and **RMSE**DUR, respectively), using ordinary least squares (OLS) linear regression. Duration is the daily time spent at the feeder in minutes. An example of the RMSE for one pig with two large deviations from illness is shown in **Figure 3** for FI (**Figure 3A**) and duration (**Figure 3B**). To obtain a phenotype for RMSE, animals had to have a minimum of 60 days of FI recorded. A less resilient animal is expected to have a larger value for RMSE. Preliminary analyses showed that without setting this minimal number of days, animals that died early in finishing were grouped on the left side of the distribution of RMSE (i.e., they would be considered more resilient). Duration (time) at the feeder was chosen over traits such as number of meals due to its strong association with off-feed events (e.g., **Figure 3**).

The second set of novel resilience phenotypes was based on quantile regression (**QR**), which can be useful for regression problems that include heterogeneous variances (Cade and Noon, 2003). A 5% quantile regression was performed using all data across batches, separately for FI and duration (**Figures 4A**,**B**). Negative residuals (below the regression line) from these regression equations were used to classify a day of FI or duration for an individual pig as an off-feed day (**Figures 4C,D**). These were aggregated within animal to a proportion of "off-feed" days (one record per animal). As with RMSE, each animal received only one phenotype for FI and for duration (**QR**FI and **QR**DUR). The 5% threshold was set based on **Figures 4A,B**, as it separated the "cloud" of relatively healthy days from off-feed days, as well as appraisal of FI plots within animal. In total, 258 animals (25%) did not have any day below the 5% quantile regression for QRFI, while a 1% threshold resulted in 677 animals (65%) not having any days below the threshold. To obtain a phenotype for QR, animals had to have at least 60 days of FI recorded (same for RMSE). As with RMSE, susceptible animals are expected to have larger values for QR than resilient animals.

Production traits analyzed included nursery ADG (**NurADG**), finishing ADG (**FinADG**), average daily feed intake (**ADFI**), feed conversion ratio (**FCR**), residual feed intake (**RFI**), carcass weight (**CWT**), dressing proportion (**DRS**), lean yield (**LYLD**), carcass backfat (**CBF**), and carcass loin depth (**CLD**). To obtain a phenotype for a production trait, pigs had to complete the corresponding phase (nursery or finishing). Nursery and finishing ADG were calculated from regression slopes of daily LOESS weights (see above) on age for the entire nursery period (quarantine and challenge nursery) and the finishing period, respectively. NurADG started when the pig entered the quarantine nursery. NurADG ended and FinADG started the first day FI was recorded. LOESS predicted daily weights were used to compute ADG because a weight was not always available for the days when pigs were moved to the finishing unit. Also, some animals received more weights prior to being euthanized or death, which would influence the regression of weight on age (not evenly spaced). The impact of using LOESS predicted instead of observed weights was very small for FinADG (more weights) but was more significant for NurADG, as the correlation for FinADG with or without use of LOESS prediction was very high when using the closest endpoints, but much lower for NurADG. This was because the nursery period was much shorter, and many pigs only had two weights prior to being moved to the finishing unit and therefore a larger change in ADG was observed. Average daily feed intake (ADFI) was the average feed intake of daily records during the finishing period. Feed conversion ratio (FCR) was defined as the sum of daily records for FI over the total body weight gain for that same finishing period. Residual feed intake (RFI) was computed in a one-step analysis following Cai et al. (2008), using ADFI as the response variable and average body weight (average weight in the finisher), finishing ADG, and ultrasound backfat as covariates, along with other fixed effects, as described below. Ultrasound backfat was taken just prior to slaughter at the 10th rib. Dressing proportion was calculated

by dividing the carcass weight (head on, leaf lard in, warm carcass) by the live weight prior to slaughter. Carcass backfat (CBF) and loin depth (CLD) were recorded using a Destron FearingTM machine (Texas, USA) at the abattoir. Lean yield was calculated using the following regression equation for lean yield in Québec: LYLD = 68.1863 − (0.7833∗CBF) + (0.0689∗CLD) + (0.008∗CBF<sup>2</sup> ) − (0.0002∗CLD<sup>2</sup> ) + (0.0006∗CBF∗CLD) (Pomar and Marcoux, 2003). This equation was mostly driven by backfat (r = −0.98). Not all batches had carcass data, leading to some variation in the number of observations for these traits. Carcass phenotypes were also captured at different time points within batch due to the protocol to only send the pigs that met market weight at each slaughter date, as mentioned above. The average live weight at slaughter was 118.9 kg.

#### Genotyping

Animals were genotyped with the 650 k Affymetrix Axiom Porcine Genotyping Array by Delta Genomics (Edmonton AB, Canada). In total, 658,692 single nucleotide polymorphisms (SNPs) were included on the chip. Raw Affyymetrix SNP data output was processed separately for each cycle by Delta Genomics with the Axiom <sup>R</sup> Analysis Suite using all defaults. The SNPs that passed quality control for all three cycles were utilized for analysis, for a total of 516,066 SNPs. Imputation of missing genotypes was completed with FImpute (Sargolzaei et al., 2014). The pedigree was utilized for imputation but only included the dam at the multiplier, since sire was typically unknown due to the use of pooled semen. Genotypes were then processed using the preGSf90 software from the BLUPF90 family of programs, using defaults (Misztal et al., 2002). Genotypes on seventeen samples were found to be duplicates and were removed. After all quality control, genotypes on 1,215 animals and 487,762 SNPs remained.

## Variance Component Analyses

Variance components were estimated by single-step GBLUP with the **H** matrix (Legarra et al., 2009; Christensen and Lund, 2010), using the BLUPF90 family of programs (Misztal et al., 2002). Data included phenotypes on 1,341 animals, of which 1,215 had genotypes. Basic animal models were fit for all traits, with random animal genetic effects (using the **H** matrix) and random residuals. The genomic relationship matrix (**G**) was calculated using **ZZ**′ /sum(2pq) (VanRaden, 2008), where **Z** = **M**-**P**. Only the dam was available to construct the **A** matrix. Single trait models were used to obtain heritability estimates and bivariate models for genetic correlations. Models for mortality and number of treatments included batch and age of entry into the quarantine nursery as fixed effects and were modeled as linear traits. Mortality was initially analyzed as a threshold trait but resulted in unrealistically large estimates of heritability. A simulation was used to confirm that threshold models tended to significantly overestimate heritability with small sample sizes such as this study. One alternative could be to use a more


TABLE 1 | Counts and means for measures of resilience in three cycles of the natural challenge experiment (n = 1,341 total animals entered).

<sup>a</sup>Number of treatments, animals must have made it to slaughter.

<sup>b</sup>Treatment rate adjusted to 180 days, animals must have made to through 65 days of age to obtain a phenotype.

<sup>c</sup>Root mean square error (RMSE) from the within animal regression of Feed Intake (FI) or Duration (DUR) on age with at least 60 days of FI.

<sup>d</sup>Quantile regression (QR) from using the 5% QR over all the feed intake (FI) or duration (DUR) data and then aggregating off-feed days within animal as a proportion.

recent approach from Ødegård et al. (2010) but mortality was not the main focus of this research. More data may be needed to analyze mortality as a threshold trait. Analyses for finishing traits included fixed effects of batch, finishing start age, and finishing pen. Litter effects (random) were minimal (below 0.05 for the proportion explained and within one SE of zero) for the traits analyzed and, therefore, were subsequently dropped from all analyses. Litter effects were also difficult to estimate, with an average of 2.02 litter mates per pig. Not all animals survived to record a phenotype for traits recorded later such as FinADG or carcass traits, therefore for these traits the average was <2 litter mates per pig.

#### RESULTS

**Table 1** shows summary statistics for the three cycles of data used in the analyses (seven batches per cycle). Batches included from 59 to 77 pigs, except for one batch of 28 (not shown), and each cycle ranged from 441 to 452 pigs (1,341 total). Mortality was highest in cycle one (35%), decreased in cycle 2 (13%), and then returned to a higher rate in cycle 3 (29%). Mortality per batch ranged from 4 to 57%, with the median being 18%. The continuous flow system maintained pathogen burden throughout the study, however, seasonality clearly led to higher mortality during the winter months. In contrast to TRT180, TRT did not follow the mortality trend due to the requirement of survival to slaughter. In general, the FI resilience phenotypes followed the same time trend as mortality, except for QRDUR.

**Table 2** shows the number of observations and summary statistics for each trait. RMSE and QR measures of resilience were required to have 60 days of FI to receive a phenotype, which removed 188 animals from those that made it into the finishing unit. The average RMSEFI was 0.47 kg, ranging from 0.19 to 0.97 kg. RMSEDUR averaged 13.10 min, with a range of 5.71 to 37.54 min. One major difference between TRT and TRT180 was that TRT180 allowed animals that died after 65 days of age to record a phenotype, which added 219 phenotypes. Of those that survived, the number of treatments was 1.79 on average, but 2.43 for TRT180 (median of 1.97). Due to the health challenges, many of the production phenotypes had a wide range. Nursery ADG ranged from 0 to 0.67 kg/d and finishing ADG from 0.36 to 1.20 kg/d. This caused carcass weights to have a wide range as well, despite the aim to slaughter at a "fixed weight."

**Table 3** shows estimates of heritabilities and genetic correlations among the resilience traits and between resilience traits and production traits. Many estimates had large SE due to relatively small sample sizes. Heritability estimates for the novel resilience traits ranged from 0.15 to 0.26. Mortality had a heritability estimate of 0.13 ± 0.05, while TRT and TRT180 had estimated heritabilities of 0.13 ± 0.07 and 0.29 ± 0.07, respectively. The estimate of the genetic correlation between mortality and TRT180 was 0.93 + 0.29 (results not shown). Estimates of genetic correlations among the novel resilience measures ranged from 0.01 to 0.67, indicating they are different genetic traits. Estimates of genetic correlations of mortality and TRT180 with novel resilience traits were positive, as expected, and ranged from 0.37 to 0.85. Due to data processing and removal of phenotypes from TRT because of the requirement of survival to slaughter, TRT180 was deemed to be a better phenotype for validation of the novel traits (**Table 2**). The estimate of the genetic correlation of RMSEDUR was 0.12 ± 0.76 with TRT and 0.62 ± 0.13 with TRT180. Of the two RMSE measures of resilience, RMSEDUR was more highly correlated genetically with mortality and treatments than RMSEFI. For the QR traits, QRFI had a slightly higher genetic correlation with mortality and number of treatments than QRDUR, which could be because farm staff received daily reports of which pigs were not eating enough feed and were flagged for further evaluation (see discussion).

Estimates of genetic correlations of RMSE traits with production traits were low, but many were favorable (**Table 3**). Nursery ADG was unfavorably correlated with RMSEFI (0.77 ± 0.24) but most of the other production traits had favorable or close to zero genetic correlations with the two RMSE measures of resilience. Finishing ADG had a genetic correlation estimate of −0.31 ± 0.26 with RMSEFI and of −0.19 ± 0.26 with RMSEDUR. Feed efficiency based on FCR and RFI were genetically correlated with RMSEFI (0.39 ± 0.21 and −0.22 ± 0.27, respectively). Resilience based on QR measures was more strongly associated with production traits than resilience based on RMSE. Both QRFI and QRDUR had strong genetic correlations with FinADG, at −0.75 ± 0.26 and −0.70 ± 0.17, respectively. Notice, however, that QR was not strongly correlated with NurADG, likely because feed intake was only collected in the finisher. ADFI was negatively correlated with QRFI and QRDUR, at −0.79 ± 0.19 and −0.58 ± 0.16, respectively. Estimates of genetic correlations of QR with FCR were low, at −0.14 ± 0.35 and 0.02 ± 0.24, vs. −0.78 ± 0.21 and −0.63 ± 0.16 with RFI, which were similar to those for ADFI. Carcass BF and LD had negative genetic correlations with QR measures of resilience (−0.36 to −0.21).



<sup>a</sup>TRT, number of treatments for animals that made it to slaughter; TRT180, treatment rate adjusted to 180 days for pigs that made it to 65 days of age; RMSE, root mean square error (novel phenotype with FI or duration); QR, quantile regression as a proportion (novel phenotype with FI or duration); NurADG, nursery ADG; FinADG, finishing ADG, ADFI, average daily feed intake; FCR, feed conversion ratio (kg feed / kg weight gain); RFI, residual feed intake (adjusted for FinADG, metabolic weight, and ultrasound backfat); CWT, carcass weight; DRS, dressing proportion; LYLD, lean yield (equation using backfat and loin depth); CBF, carcass backfat; CLD, carcass loin depth. RMSE and QR phenotypes required 60 days of FI.

<sup>b</sup>Overall mortality proportion.

<sup>c</sup>Median, min, and max by batch, not individual.

<sup>d</sup>TRT required the animal to survive to slaughter. TRT180 required the animal survive to 65 days of age.

<sup>e</sup>Residual feed intake (RFI) was calculated using ADFI as the response in a one-step method.

**Table 4** shows estimates of genetic correlations of production traits with mortality and number of treatments. Production traits tended to have low genetic correlations with mortality (<0.30 in absolute value) but higher with number of treatments for some traits. Estimates of the genetic correlation of finishing ADG and ADFI with TRT and TRT180 ranged from −0.60 to −0.70. Carcass weight also showed a strong negative genetic correlation of −0.67 ± 0.14 with TRT180, similar to FinADG. Carcass BF, LD, and LYLD were weakly genetically correlated with both number of treatments and mortality.

#### DISCUSSION

Novel disease resilience measures were extracted from daily feed intake data of grow-finish pigs that were exposed to a multifactorial natural disease challenge that was designed to mimic a commercial environment with high disease pressure to maximize the expression of genetic differences of resilience between animals. Although the specific disease and environmental conditions that were established in this study cannot be exactly replicated, the general protocols established can be replicated in both research and commercial settings, similar to the replication of field studies on health-challenged farms. Moreover, although infection pressure waxes and wanes over time, it is assumed to be relatively consistent within batch because of the close proximity in which new batches are housed.

The resilience traits that were derived from individual daily feed intake data showed moderate heritabilities and moderate to strong genetic correlations with mortality and treatment rate. Genetic correlations production traits tended to be low for the RMSE measures of resilience but higher for the QR measures. Data from the most important disease exposure period, i.e., the challenge (2nd) nursery, were not included in either RMSE or QR measures of resilience because individual feed intake could only be collected in the finishing unit. The challenge nursery period was, however, critical, as this represented the first exposure to many pathogens in the barn for most batches (nose-to-nose contact for new batches with older already infected batches). Thus, pigs could have been infected with pathogens and recovered in the nursery before feed intake recording started in the finishing unit. This may have reduced genetic correlations of the evaluated novel resilience traits with mortality or number of treatments. Future research could address this by collecting important phenotypic data during the entire challenge period or by setting up the nursery away from the finishing challenge facility.

The RMSE measures of resilience proposed here were designed to quantify severity of disease and other stressors on individual animals over time (see below), whereas QR measures of resilience classified days as off-feed events, reflecting more extreme events, making the QR measures less sensitive and showing less variation than RMSE measures. This may partially explain the slightly lower estimates of heritability for QR compared to RMSE measures and the higher genetic correlations of QR with TRT180 and mortality than RMSE, as both mortality and treatments are the result of severe clinical disease. Pigs were typically not euthanized until the disease had progressed and the animal was clearly suffering. Treatments were generally given only when clinical signs of illness were present (e.g., diarrhea, coughing, lethargy, etc.). RMSE measures of resilience may have the ability to capture subclinical disease and other stresses in addition to clinical disease, enabling it to be more sensitive than number of treatments, mortality, and QR measures of resilience, which typically capture only severe events. This would make RMSE measures of resilience different traits than treatments, mortality, and QR, which was supported by the estimates of genetic correlations. Although QR measures of resilience can also capture the effects of stressors other than disease, it is less likely to do so compared to RMSE due to the larger impact of disease on feed intake compared to other stressors (results not shown).

Quantile regression measures of resilience tended to have higher genetic correlations with production traits than RMSE, likely because pigs that grow slower typically have lower ADFI and, thus, when they get sick, they need a smaller drop in FI to drop below the QR line. In contrast, animals with high ADFI must drop further to have a drop below the QR threshold. Thus, pigs with low average FI across the finishing period are expected to have more days classified as being off-feed days, resulting in TABLE 3 | Estimates of heritability (SE) for traits analyzed and of genetic correlations (SE) with resilience measures (n = 1341 total animals, see Table 2 for actual counts per phenotype).


<sup>a</sup> RMSE, root mean square error (for FI or duration); QR, quantile regression as a proportion (for FI or duration); TRT, number of treatments for animals that made it to slaughter; TRT180, treatment rate adjusted to 180 days for pigs that made it to 65 days of age; NurADG, nursery ADG; FinADG, finishing ADG; ADFI, average daily feed intake; FCR, feed conversion ratio (kg feed / kg weight gain); RFI, residual feed intake (adjusted for FinADG; metabolic weight; and ultrasound backfat); CWT, carcass weight; DRS, dressing proportion; LYLD, lean yield (equation using backfat and loin depth); CBF, carcass backfat; CLD, carcass loin depth. RMSE and QR phenotypes required 60 days of FI.

higher genetic correlations of QR measures of resilience with traits that are closely related to FI such as ADG, than RMSE. Refining these resilience phenotypes will be a focus of future research.

Feed Intake Duration Feeding duration was used in this study as a proxy for drops in FI. In the past, there have been many attempts to link feeding traits with FI (de Haer et al., 1993; Von Felde et al., 1996; Young et al., 2011; Lu et al., 2017). In animal breeding, these traits include duration (time at the feeder), number of visits, and feed intake rate. Previous studies were typically conducted in healthy environments and feeding traits such as duration at the feeder may become more valuable under disease challenge. **Figures 3**, **4** show how the pattern of FI and duration were very similar across time for this selected animal. Measures of resilience based on duration had comparable genetic correlations with mortality and number of treatments as measures of resilience based on FI in the present study. Day-to-day variation in duration at the feeder could be more applicable on commercial farms if commercial feeders could be retrofitted to record individual time at the feeder using antennae and RFID tags. This could also be extended into the nursery, allowing feeding traits to be collected over the entire wean-to-finish period. Additional research is needed to evaluate other feeding traits that can be extracted from electronic feeders (Kyriazakis and Tolkamp, 2018). Feeding patterns within a day may be useful and could be utilized to better quantify resilience. The current study took the simple approach and used daily totals, but this is only a starting point for more research on this topic.

## Causes of Variation in FI and Their Relationship With Resilience

Colditz and Hine (2016) presented a holistic view of resilience by including other stressors to define general environmental resilience. In the current study, it is not possible to verify that all drops in FI and duration at the feeder observed in our data are due to disease alone. Martínez-Miró et al. (2016) categorized animal stressors into social, environmental, metabolic, immunological, and human interactions. Each of these could be decomposed into more detailed stressors. For instance, immunological stressors can be broken down further into individual resistance, tolerance, or resilience toward PRRSV or PCV2 (among others). There can also be interactions between these stressors (Salak-Johnson and McGlone, 2007), although other studies have suggested some stressors may be additive (Hyun et al., 1998).

There is a long list of stressors that can impact feed intake and performance on swine. The impact of pathogens on feed intake has been well established in the literature (Sandberg et al., 2006; Kyriazakis and Doeschl-Wilson, 2009) and is dependent upon, but is not limited to, the type of pathogen, the strain of the pathogen, previous exposure, and vaccinations. Porcine reproductive and respiratory syndrome virus alone costs the swine industry an estimated \$664 million annually in the US


<sup>a</sup>NurADG, nursery ADG; FinADG, finishing ADG; ADFI, average daily feed intake; FCR, feed conversion ratio (kg feed/kg weight gain); RFI, residual feed intake (adjusted for FinADG, metabolic weight, and ultrasound backfat); CWT, carcass weight; DRS, dressing proportion; LYLD, lean yield (equation using backfat and loin depth); CBF, carcass backfat; CLD, carcass loin depth.

<sup>b</sup>Did not converge.

(Holtkamp et al., 2013). Heat stress is another common reason why animals deviate from their expected FI (Guy et al., 2017), which has been characterized in growing pigs (Rauw et al., 2017b) and in sows (Vilas Boas Ribeiro et al., 2018). Mycotoxins have been known for a long time to influence feed intake (Smith et al., 1997). Social interactions (i.e., space requirements) are another common source of stress in swine (Hyun et al., 1998). These social effects have been investigated in piglets (Bouwman et al., 2010), in growing pigs (Street and Gonyou, 2008), as well as in sows (Hemsworth et al., 2013). Martínez-Miró et al. (2016) discussed many other stressors including human handling, vaccination, dust/gas/ammonia, and out of feed and water events.

Knap (2009) originally used an example of heat stress in pigs to show the potential relevance of day-to-day variability in feed intake. The measures of resilience developed here could also be used to quantify resilience to heat tolerance (Fragomeni et al., 2016; Guy et al., 2017), activity level (Sadler et al., 2011; Gilbert et al., 2017; King et al., 2018), and possibly even reduce stressful interactions for pigs (Rauw et al., 2017a). Heat stress was estimated to cost the US swine industry \$299 million per year (St-Pierre et al., 2003). These measures could also be based on other sources of data such as water intake data (Madsen and Kristensen, 2005; Rusakovica et al., 2017) or body temperature recordings on individual pigs (Petry et al., 2005, 2017). Elgersma et al. (2018) developed variation and "drop phenotypes" from milk yield data in dairy cows. The phenotypes developed in the current study could also be used to develop similar phenotypes for other species.

A problem with the interpretation of the types of resilience measures developed here and by Elgersma et al. (2018) is that factors influencing resilience phenotypes in general are still a "black-box" (Mulder and Rashidi, 2017), not only in terms of different diseases but for all the other stressors described above. This is one reason why we cannot expect the genetic correlation between RMSE and mortality or treatments to be one, as factors that influence feed intake, could be non-health related. Another reason may be that RMSE captures sub-clinical disease better than QR (Elgersma et al., 2018 mentions this also for their resilience traits). Although from a practical or commercial breeding standpoint, it probably matters little why animals deviate from expected feed intake. Traits presented in the current study should be thought of as having economic value (Elgersma et al., 2018). The usefulness of these novel traits in a breeding program will depend on the commercial environment and how representative the testing herds are of the target environments.

#### Genetic Parameters

Most estimates of heritability for production traits were within the accepted industry range (Ciobanu et al., 2011; Clutter, 2011), although this study was conducted under a strong health challenge. To the best of our knowledge, there are no estimates of genetic parameters for the novel resilience traits evaluated here in pigs. Variation for different traits has been explored as a potential indicator trait for resilience in dairy cattle. Green et al. (2004) evaluated the use of changes in somatic cell count (SCC) over time as an indicator for mastitis in lactating dairy cows and concluded that the maximum SCC and the standard deviation of log SCC were the best phenotypic indicators for incidence of mastitis. Recently, Elgersma et al. (2018) estimated genetic parameters for resilience traits from daily milk yield data from automated milking systems. Resilience indicators from milk yield data were calculated using the sum of "drop" days, negative slopes, and overall variation in milk yield calculated within lactation for each cow. Heritability estimates ranged from 0.06 to 0.10 and genetic correlations of variation in milk yield with udder health, ketosis, longevity, and persistency ranged from −0.29 to −0.52 (Elgersma et al., 2018). Elgersma et al. (2018), however, did not account for the individual cow milk yield trajectory over lactation when computing day-to-day variation in milk yield but targeted this for future research.

Heritability estimates for mortality and treatments in pigs are difficult to find in literature because of the swine pyramid structure, which results in most studies focusing on data collected in herds with limited disease. Guy et al. (2018) estimated the heritability of treatments to be between 0.04 and 0.06. Commercial test herds using the three-way terminal cross are becoming more popular in the swine industry but results from such data are not commonly reported in the literature. One example is Dufrasne et al. (2014), who used a sire model to estimate variance components for mortality (culling) traits. Heritability estimates ranged from 0.03 to 0.14 using threshold models (Dufrasne et al., 2014) but the rate of mortality after weaning was very low (<1%), which seems very unrealistic as typical commercial wean-to-finish barns have between 6 and 9% mortality on average (Stalder, 2017). Estimates of heritability for treatment and mortality could change with the amount of health challenge and incidence (Bishop and Woolliams, 2014). Companies will need to decide how much of a health challenge they need if they aim to select for resilience to disease. Challenging pigs too much comes at an economic and animal welfare cost. If not challenged enough, heritabilities of mortality, and treatments may become lower and response to selection will be slowed (Mulder and Rashidi, 2017), although low heritabilities may be partially overcome with very large family sizes (many matings per sire). Treatment data is also challenging to collect in commercial testing systems. Many use mass treatments for disease outbreaks (e.g., feed and water medication). Water treatments may be more helpful for treatment under challenge, while feed medications may be more helpful for prevention (due to the off-feed events under challenge). Although it is possible to collect individual treatment data, commercial farms differ in the amount of data, and details they record. Factors such as withdrawal times may influence when and if an animal receives treatment. When animals are treated and/or euthanized is based in part on subjective decisions by farm or veterinary staff. If antibiotic free production is involved, this may also influence the decision to treat an animal or not. Resilience is expected to be more economically important in those conditions as management cannot mask the genetic potential for resilience.

## Implementing Quantile Regression (QR) Phenotypes

One of the challenges when implementing QR phenotypes is that the quantile regression equation will depend on the severity of the disease challenge. For instance, if one barn is completely healthy over the years and another barn is severely challenged, the QR equation for each barn will be very different even if both are at the 5% level. This difference will be tied to how often contemporary groups are challenged and to what degree they are challenged. Mulder and Rashidi (2017) discussed the percentage of contemporary groups challenged and how that affects selection efficiency. When starting a commercial testing system, setting this initial QR threshold may be difficult. If the first groups are not heavily challenged, it will lead to setting the QR equation too high and capturing days that are not due to illness and other stressors, simply normal daily variation in FI. With a weaker disease challenge, a more appropriate QR may be 1%. One possibility is to create a training dataset for QR based on contemporary groups that were challenged and set the threshold based on that regression. Another possibility would be to take only healthy contemporary groups and set a lower bound threshold based on that data.

#### Heterogeneous Residual Variance in FI Data

One problem with using daily variation in FI vs. duration is that the variance of FI increases with age, which is not observed for daily duration data. This results in stressors having a greater impact on FI for older vs. younger pigs. As a result, RMSEFI puts a greater weight on later compared to earlier ages. Mean duration showed a slight negative trend with age and its variance was fairly constant across the finishing period. This may be one reason for the fairly low genetic correlation between FI and duration measures and could also explain why RMSEDUR had slightly higher genetic correlations with mortality and number of treatments than RMSEFI, as RMSEDUR weights the early finishing period the same as the late finishing period. An attempt was made to adjust RMSEFI for this increasing variation over time, but this still resulted in large outliers at later ages and did not improve the phenotype much in terms of genetic correlations to mortality and treatments (results not shown).

#### Causes of Mortality

Most recorded mortality reasons were linked to disease. Exceptions included death from blood sampling, rectal prolapse, fighting, and fracture/sprains, which amounted to ∼11% of the 344 mortalities observed in these data. Removing mortality records due to non-health reasons, however, only resulted in small changes in estimates of heritability and genetic correlations, so they were left in the dataset for the current analyses, also because mortality by definition includes any pig that died regardless of cause. Although we typically think of mortality as health related, it is very multifactorial, as is sow mortality. One could decide to separate mortality by cause due to different genetic architecture for each cause and different economic weights (due to the average timing of death), but it is likely that the heritability would be even lower due to lower incidence, which would limit genetic selection. In addition, mortality for non-health related reasons could also have a genetic component. Treatments were almost exclusively linked to disease. Although some, such as lameness, could be argued to not be linked to disease, some diseases can be linked to lameness and removing them then becomes controversial.

### Impact of Increased Variation in Performance

One major impact of disease is the increase in variation in production phenotypes such as growth, causing some pigs to be less than full value when harvested (Fix et al., 2010). In the current study, if pigs did not make weight, they were held in the finisher until they made the target weight range, resulting in more pigs achieving full value when harvested. The definition of full value is not consistent in research or the swine industry and was therefore avoided in this analysis. Some production systems require all animals to leave at a certain date regardless of weight (i.e., fixed time systems). This would lead to additional costs from disease as a result of greater variability in slaughter weights and carcass weights. Hubbs et al. (2008) used moments beyond the mean to include variance and skew for determining optimal marketing decisions and concluded these higher-order statistics appeared to be more important than they were in the past. Not only are carcasses lighter and therefore worth less in total, sort loss (or discount losses) from not meeting the optimal weight grid will also penalize these animals (Boys et al., 2007). Sometimes, these lightweight animals can go to alternative markets, but not always (Fix et al., 2010).

## Use of Novel Traits in Healthy Nucleus Environments

The novel traits evaluated here can also be recorded in relatively healthy nucleus environments for stressors other than health challenges. A second major factor impacting these novel traits may be heat stress. The genetic correlation between RMSE in the nucleus vs. in a commercial environment (under disease challenge) will likely depend on the level and nature of the disease challenge in the commercial environment and the amount of heat stress in each environment (among other stressors). Barns located in the Southern/SE USA region will be affected differently by heat stress than those in the upper Midwest or Canada. This barn was in Québec, Canada, and the heat stress experienced was minimal compared to other areas around the world. The novel resilience traits evaluated here will likely have lower means and be less variable and have lower heritability under nucleus conditions but will still include resilience to stressors.

#### CONCLUSIONS

Day-to-day variation (RMSE and QR) in feed intake or duration at the feeder can be used to quantify resilience in health challenged environments, such as a commercial testing scheme. The novel resilience phenotypes studied here were moderately heritable and genetically correlated with mortality and treatment rate. The genetic correlations reported here may underestimate true correlations because the initial challenge period was missed because pigs were first challenged in the nursery and feed intake data for RMSE and QR was recorded in the finishing unit only, while mortality and treatments were recorded over the entire wean-to-finish period. Many factors can cause variation in feed intake and in time at the feeder, including disease, heat stress, handling, and social interactions. Thus, the measures of resilience investigated here are still "black-box" phenotypes and should be viewed as general resilience instead of the narrower concept of disease resilience. Overall, daily variation in FI or associated duration data can be used to quantify resilience.

#### AUTHOR CONTRIBUTIONS

AP came up with the novel phenotypes, analyzed the data, and wrote the manuscript with help from JD. GP, MD, PGC, JH, FF, and JD designed the project and developed protocols for animal sourcing, management, and phenotype recording. JH was

#### REFERENCES


in charge of veterinary oversight on the project. GP was in charge of the database and genotyping for the project. FF was the lead on day-to-day data collection and scheduling. All authors helped with interpretation of the results and reviewed and approved the final manuscript.

#### FUNDING

This project was funded by Genome Canada, Genome Alberta, Genome Prairie and PGC.

#### ACKNOWLEDGMENTS

The following individuals served as collaborators and representatives for member companies of the PigGen Canada Consortium. They participated in project and protocol development and implementation, coordinated the sources of piglets and collection of associated data, and contributed to the project through regular discussions during execution of the natural disease challenge project: Mr. D. Vandenbroek and Mr. B. DeVries, Alliance Genetics Canada, St. Thomas, ON, Canada; Dr. N. Dion and Ms. S. Blanchette, AlphaGene, St.-Hyacinthe, QU, Canada; Dr. T. Rathje, DNA Genetics, Columbus, NE, USA; Mr. M. Duggan, FastGenetics, Saskatoon, SK, Canada; Dr. R. Kemp, Genesus, London, ON, Canada; Dr. P. Charagu, Hypor, Regina, SK, Canada; and Dr. P. Mathur, Topigs Norsvin, Helvoirt, The Netherlands. Mr. Michael Lowings at Delta Genomics (Edmonton, AB, Canada) is acknowledged for processing the raw genotypes from the Affymetrix SNP chip and Mr. Patrick Gagnon at CDPQ for processing the raw feed intake data. Thanks to Mr. Jason Grant at the University of Alberta for managing the database for the project and helping with genotyping and SNP map file questions. Special thanks to Dr. Wendy Rauw for helpful discussions on resilience and from her presentation on heat tolerant vs. non-heat tolerant growing pigs, which inspired the RMSE measure used in the present study.


resistance and tolerance to infection outcome. Proc. R. Soc. B. 282:20152151. doi: 10.1098/rspb.2015.2151


group-housed boars. Livest. Prod. Sci. 47, 11–22. doi: 10.1016/S0301-6226(96) 01006-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a past co-authorship with one of the authors JD.

Copyright © 2019 Putz, Harding, Dyck, Fortin, Plastow, Dekkers and PigGen Canada. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Opportunities to Improve Resilience in Animal Breeding Programs

Tom V. L. Berghof\*, Marieke Poppe and Han A. Mulder

*Wageningen University & Research Animal Breeding and Genomics, Wageningen, Netherlands*

Resilience is the capacity of an animal to be minimally affected by disturbances or to rapidly return to the state pertained before exposure to a disturbance. However, indicators for general resilience to environmental disturbances have not yet been defined, and perhaps therefore resilience is not yet included in breeding goals. The current developments on big data collection give opportunities to determine new resilience indicators based on longitudinal data, which can aid to incorporate resilience in animal breeding goals. The objectives of this paper were: (1) to define resilience indicator traits based on big data, (2) to define economic values for resilience, and (3) to show the potential to improve resilience of livestock through inclusion of resilience in breeding goals. Resilience might be measured based on deviations from expected production levels over a period of time. Suitable resilience indicators could be the variance of deviations, the autocorrelation of deviations, the skewness of deviations, and the slope of a reaction norm. These (new) resilience indicators provide opportunity to include resilience in breeding programs. Economic values of resilience indicators in the selection index can be calculated based on reduced costs due to labor and treatments. For example, when labor time is restricted, the economic value of resilience increases with an increasing number of animals per farm, and can become as large as the economic value of production. This shows the importance of including resilience in breeding goals. Two scenarios were described to show the additional benefit of including resilience in breeding programs. These examples showed that it is hard to improve resilience with only production traits in the selection index, but that it is possible to greatly improve resilience by including resilience indicators in the selection index. However, when health-related traits are already present in the selection index, the effect is smaller. Nevertheless, inclusion of resilience indicators in the selection index increases the response in the breeding goal and resilience, which results in less labor-demanding, and thus easier-to-manage livestock. Current developments on massive collection of data, and new phenotypes based on these data, offer exciting opportunities to breed for improved resilience of livestock.

Keywords: resilience, livestock, breeding program, micro-environment, macro-environment, economic value, big data, longitudinal data

#### Edited by:

*Andrea B. Doeschl-Wilson, Roslin Institute, University of Edinburgh, United Kingdom*

#### Reviewed by:

*Jiuzhou Song, University of Maryland, College Park, United States Jack Dekkers, Iowa State University, United States Pieter Willem Knap, Genus-PIC, Germany*

> \*Correspondence: *Tom V. L. Berghof tom.berghof@wur.nl*

#### Specialty section:

*This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics*

Received: *26 June 2018* Accepted: *11 December 2018* Published: *14 January 2019*

#### Citation:

*Berghof TVL, Poppe M and Mulder HA (2019) Opportunities to Improve Resilience in Animal Breeding Programs. Front. Genet. 9:692. doi: 10.3389/fgene.2018.00692*

## INTRODUCTION

Modern livestock production is characterized by intensification, i.e., a higher number of animals per farm. To achieve successful intensification of livestock production, without negative effects on animals, farmers and farms, certain requirements need to be met. One of these requirements for intensification of livestock production is the capability of the farmer to take care of a larger number of animals. This requires healthy and easy-to-manage animals that need little/less attention time (Elgersma et al., 2018). Resilient animals are animals that need little/less attention time: increasing resilience is therefore desired. Improvement of resilience can be realized by different strategies. One strategy is to increase resilience by genetic selection in breeding programs. The advantage of genetic selection, in contrast to management improvements, is that it is cumulative and affects all subsequent generations of livestock.

We define resilience as "the capacity of the animal to be minimally affected by disturbances or to rapidly return to the state pertained before exposure to a disturbance" (adjusted from Colditz and Hine, 2016). Several definitions of resilience (e.g., Colditz and Hine, 2016), and resilienceassociated concepts have been discussed in literature: robustness, tolerance, resistance, GxE interaction, genetic heterogeneity of environmental variance, plasticity, environmental sensitivity, canalization, (developmental) stability, and residual withinindividual phenotypic variation (e.g., Holling, 1973; Debat and David, 2001; De Jong and Bijma, 2002; Flatt, 2005; Knap, 2005; Mulder et al., 2007, 2013; Bishop, 2012; Westneat et al., 2015; Colditz and Hine, 2016; Marjanovic et al., 2018). We have summarized these definitions and concepts in **Box 1**, but it is beyond the scope of this paper to discuss differences and similarities between these.

Disturbances can be of different nature, being either physical (e.g., disease, temperature stress) or psychological (e.g., novel environment, social stressor, human interaction) (see Colditz and Hine, 2016 for a review). "General" resilience is therefore a composite trait, consisting of different resilience types depending on the nature of the disturbance (Colditz and Hine, 2016; Elgersma et al., 2018). Disturbances can be categorized as "macro-environmental factors" or "microenvironmental factors" (Falconer and Mackay, 1996; Mulder et al., 2013). Macro-environmental factors are characteristics of an environment and thus affect the (majority of the) whole population within that macro-environment (e.g., disease pressure, ambient temperature). Genetic variation in the response to these macro-environmental factors can be expressed as the genetic variance of the slope of a reaction norm over different environments or different quantities of a disturbance (Mulder et al., 2013). Micro-environmental factors occur within a macro-environment, and thus affect only a minority of the whole population within that macro-environment (e.g., diseases, social interactions). Genetic differences in response to microenvironmental factors can be expressed as genetic variance in the size of environmental variance (Mulder et al., 2013). Despite the fact that macro- and micro-environmental sensitivities refer to different concepts and are best investigated with different methods (Mulder et al., 2015a), the estimated genetic correlation between them was high (0.76 with SE = 0.10; Mulder et al., 2013). This indicates that, even though disturbances have effects on a different scale, resilience to these disturbances have a common genetic background and will respond to selection for increased resilience in a similar direction. Nevertheless, from a practical point of view, resilience to occasional macroenvironmental disturbances, such as disease outbreaks and heat waves, are less frequent and therefore of lesser importance, at least in most farms in temperate climates. Thus easy-tomanage livestock is livestock with increased resilience to dayto-day micro-environmental disturbances within the macroenvironment (i.e., a farm), but also with increased resilience to macro-environmental disturbances.

Resilience indicators are not yet included in selection indices as far as we know, despite their clear relevance for healthy and easy-to-manage livestock (and also production uniformity and production efficiency). This is likely due to ignorance on how to define, measure and weight resilience indicators and their economic values, and due to the belief that current health-related traits in selection indices (e.g., longevity, mortality, growth) cover resilience, at least mostly (e.g., Knap, 2005). Although suggestions for measuring environmental sensitivity (Knap, 2009b) (i.e., resilience) and determining its economic value have been proposed (Knap, 2005), proper data collection and data processing tools were considered to be major challenges for implementation (Knap, 2009b). However, recent technological developments allow collection of big/longitudinal data and derivation of new phenotypes from it (Mulder, 2017). These developments will only continue to expand and will become more important in the future. New ideas to explore and exploit big data and new phenotypes are required. Also, health-related traits do not necessarily reflect general resilience, but mainly capture resistance to diseases (i.e., disease resilience; Mulder and Rashidi (2017). In addition, breeding value estimation for disease-related traits is limited to only a few diseases, if any. Thus there seems to be additional benefit for including general resilience indicators in selection indices.

The objectives of this paper were therefore:


The paper addresses the described objectives sequentially and ends with overall conclusions, and therefore the paper differs in its setup from conventional research papers.

#### DEFINING RESILIENCE INDICATORS

Resilience has been a popular concept in a broad range of scientific disciplines in the last two decades (e.g., Scheffer et al., 2015; Ge et al., 2016). However, defining resilience indicators has been proven to be difficult, and different options have been proposed, which suggests that the "right" resilience indicator


(Developmental) stability: the ability of a system to return to an equilibrium state after a temporary disturbance (Holling, 1973).

Residual within-individual phenotypic variation: amount of within-individual variance not explained in a specific statistical model (i.e., the average squared deviations of observations from an individual's reaction norm), averaged over a sample of individuals (Westneat et al., 2015).

has yet to be defined. Recent technological developments give new opportunities to explore alternative resilience indicators based on longitudinal data (Mulder, 2017). Here we propose resilience indicators, which exploit the availability of many repeated observations per individual or per (sire-)family (i.e., big data). We will also describe some of the conditions these resilience indicators will have to fulfill in order to be informative, and their potential weaknesses.

#### Resilience Indicators

Measuring general resilience is difficult. Many studies have focused on one particular type of resilience (especially disease resilience) and have used experimental set-ups to identify underlying physiological mechanisms. However, these mechanisms strongly depend on the nature of the disturbance, are often chosen based on the interest of the study, and might characterize a phenotype that is (too) dependent on the investigated disturbance (Colditz and Hine, 2016). Even though these studies can provide useful information and insights in physiology, results may not be representative of general resilience. Furthermore, typically these challenge environments deviate from commercial environments and might therefore be less representative due to GxE interactions. Instead it has been proposed to measure "summary characteristics of response variables" (quoted from Colditz and Hine, 2016) as general resilience indicators (Knap, 2009b; Doeschl-Wilson et al., 2012; Colditz and Hine, 2016; Elgersma et al., 2018).

Colditz and Hine (2016) proposed a diverse set of "summary characteristics of response variables" to measure resilience to a disturbance, including typical production traits like feed intake, growth rate, and other production variables. Individuals experiencing a disturbance eat and produce less than (i.e., they deviate from) their potential without a disturbance (e.g., Van der Waaij et al., 2000). However, a deviation between observed production mean and estimated potential (as proposed by Colditz and Hine, 2016) does not necessarily fully cover the definition of resilience: resilient animals might have a severe drop in production, but might also have the capacity to rapidly return to the state pertained before exposure to a disturbance compared to less resilient animals. Deviations over a period of time (for at least the length of the disturbance) likely reflect resilience better. Thus, resilience might be measured based on deviations of expected production and observed production (i.e., residuals) over a period of time.

Many potential resilience indicator traits have been described, aimed at predicting changes in states of ecosystems (see Table 1 of Scheffer et al., 2015 for an elaborate overview). However, many of these resilience indicators are difficult to apply to livestock species or genetic improvement of livestock, because the resilience indicators are not suitable in livestock production data (e.g., spatial correlation, spectral reddening). In addition, a large number of investigated resilience indicators are alike, because they use the data in a similar way, i.e., the number of unique indicators is limited. However, we do see opportunities for some of them. In this paper, we elaborated on four resilience indicators, for which we propose they show a different perspective of resilience [see also Scheffer et al. (2018)]. Suitable resilience indicators based on a single trait (e.g., production, feed intake) in animal production might be variance of deviations, autocorrelation of deviations, skewness of deviations of production traits over a period of time, and the slope of a reaction norm (see **Table 1**). We assume that in general



*We assume that in general negative deviations are more observed for less resilient animals, but this might be opposite depending on the observed trait.*

negative deviations are mostly observed for less resilient animals, but this might be opposite depending on the trait, e.g., body temperature.


an autocorrelation toward +1 (i.e., subsequent deviations are more alike) for animals influenced by disturbances and with slow(er) recovery from disturbances; and an autocorrelation toward −1 (i.e., subsequent deviations are opposite) for animals influenced by disturbances and a fast and overcompensating response to disturbances, e.g., compensatory growth. We hypothesize that the autocorrelation of deviations captures the duration (i.e., rate of recovery) of environmental perturbations an individual experiences.


(Finlay and Wilkinson, 1963), e.g., disease pressure (Herrero-Medrano et al., 2015) or a heat wave (Ravagnolo and Misztal, 2000). The slope of a reaction norm is estimated based on the production of an individual given the level of a disturbance with: a slope of 0 for animals not influenced by the disturbance, and a slope below 0 for animals influenced by the disturbance with steeper, negative slopes for animals that are influenced more. We hypothesize that the slope of a reaction norm captures the severity of a macro-environmental perturbations an individual experiences.

Thus, less resilient animals are expected to have a larger variance, a positive autocorrelation, a negative skewness, and a steeper slope than the population average, assuming that the disturbance is reducing the trait value. Resilient animals are expected to have a smaller variance (i.e., closer to 0), an autocorrelation and a skewness around 0, and a slope closer to zero than the population average.

Some remarks regarding the proposed resilience indicators need to be made. A general remark to be made is that scaling can cause that animals with a higher mean have a higher variance of deviations, but they may also be genetically more sensitive, less resilient compared to animals with a low production (Falconer and Mackay, 1996). Specific remarks to be made are: the underlying assumption for both variance and autocorrelation as resilience indicators is that deviations are mainly (or solely) negative, because disturbances will reduce production. However, variance and autocorrelation do not discriminate between negative and positive deviations. Therefore, animals showing production above the expected production, like compensatory production (perhaps a desired trait, part of resilience), will be characterized as not/less resilient. On the other hand, such animals are likely to have shown a severe reduction in production due to the disturbance, clearly showing lack of resistance against the disturbance. In contrast, skewness does differentiate between positive and negative deviations, but is likely to be more sensitive to outliers due to cubic terms. The autocorrelation can also be negative, meaning that an animal shows fast and over-compensating responses to disturbances, e.g., compensatory growth. Although a negative autocorrelation is preferred over a (strong) positive autocorrelation, an animal with an autocorrelation around 0 is not affected by a disturbance at all, and is therefore preferred over a negative autocorrelation. Finally, the estimation of the slope of a reaction norm requires quantification on farm level either by a known environmental covariate, e.g., temperature (Ravagnolo and Misztal, 2000; Bohmanova et al., 2007; Carabaño et al., 2017), or by an overall drop in production on a farm as a consequence of many individual drops in production on that farm (Rashidi et al., 2014; Herrero-Medrano et al., 2015). In addition, each disturbance has its own slope, and it is also almost impossible to estimate multiple slopes for multiple disturbances occurring at the same time. However, the slope of a reaction norm can also be interpreted as (a form of) general resilience, although the slope likely does not capture the response to micro-environmental disturbances, which occur at the individual animal level, e.g., endemic diseases. Thus the slope only provides information for disturbances that cause a decline in farm performance, such as heat stress or a disease outbreak (in case of e.g., heat stress). Thereby the slope provides only information on macro-environmental disturbances. This is in contrast to the other three proposed resilience indicators based on deviations in **Table 1**, that can be used for epidemic (at the farm level) and endemic events (at the individual level). Therefore, these indicators provide information for estimation of breeding values on all types of disturbances and thus general resilience. But perhaps the "best" resilience indicator is yet to be defined based on a combination of the different resilience indicators: a multivariate approach of resilience, for example based on cross-correlations or eigenvalues which both capture the relation(s) between different resilience indicators based on different phenotypes (e.g., Lade and Gross, 2012; Scheffer et al., 2015; Gijzel et al., 2017), or an index of a number of indicators (i.e., a selection index approach, Hazel, 1943).

Little is known about the usefulness of the proposed resilience indicators based on deviations for livestock genetics: no studies have investigated the autocorrelations of deviations and the skewness of deviations, and only a few studies have investigated the variance of deviations. Elgersma et al. (2018) investigated the raw phenotypic variance of daily milk yield over the whole milking period and its relation to health and longevity traits in dairy cows. The raw variance of milk yield was heritable (0.10; Elgersma et al., 2018). Moreover, on a genetic level, cows with a low variance in milk yield deviations had significant fewer production-related diseases [i.e., udder health, genetic correlation (rg) = −0.36; ketosis, r<sup>g</sup> = −0.52] and a higher longevity (r<sup>g</sup> = −0.30), suggesting a higher resilience (Elgersma et al., 2018). Putz et al. (2018) showed similar results for pigs kept in a "natural challenge environment": the variance of deviations in daily feed intake and deviations in daily duration at feeder during finishing phase were positively genetically correlated to mortality and number of treatments (Putz et al., 2018), indicating that pigs with lower variance have lower mortality and need less treatments. This means that both feed intake and feed duration are indicative for health status on a genetic level, and thus shows that variation in frequently measured traits can be indicative for resilience. Also, many other studies have investigated the heritability of uniformity in production traits, which is the same as variance in deviations: reported heritabilities ranged between 0.00 and 0.15 in almost all livestock species (see Hill and Mulder, 2010 for an overview up to 2010; Neves et al., 2011; Janhunen et al., 2012; Sae-Lim et al., 2015, 2017; Mulder et al., 2016; see Elgersma et al., 2018 for more). Furthermore, the genetic coefficients of variation were generally moderate to high, which suggests a relatively large potential for genetic improvement in uniformity. These studies illustrate good potential for the use of production deviations as indicator traits for resilience.

## Conditions for Measuring Resilience Indicators

For successful use of resilience indicators in breeding programs, certain conditions apply: collection of observations should be on many animals, observations need to be collected frequently

individual level of two dairy cows with the same underlying Wilmink curve (in red), but differing in resilience: in gray a less resilient dairy cow with higher fluctuations in milk yield, and in black a more resilient dairy cow with lower fluctuations in milk yield. (B) Shows measurements of carcass weight on family level of two families differing in resilience: in gray a less resilient family with higher fluctuations in carcass weight, and in black a more resilient family with lower fluctuations in carcass weight.

and over a longer period of time, the environment has to be challenging and diverse to estimate general resilience, and most importantly the resilience indicators have to be informative for resilience.

Production traits suitable for investigating resilience indicators based on deviations can be either measured repeatedly on individual level or on family level (see **Figure 1** for example). Repeated observations on individual level allow more accurate estimation of resilience indicators and their genetic variance and are therefore preferred. Suitable traits are for example milk yield, egg weight, body weight/growth, and feed intake. An important point that needs to be addressed is that these traits might differ in size of deviations depending on the (lactation) stage or age (for growth of young animals): standardizing deviations per stage or age might be required or treating different stages/ages as different traits. In case too few repeated observations on production traits are collected per individual, deviations of production traits collected on (sire-)family level are an alternative. Especially livestock kept in high numbers with a relatively low economic value (e.g., fish and poultry) or livestock kept in extensive farming systems (e.g., beef cattle and sheep) can benefit from such an approach to investigate resilience indicators. Suitable traits are for example slaughter/carcass traits (e.g., Ibáñez-Escriche et al., 2008) or traits recorded only during a restricted lifetime period (e.g., Neves et al., 2011; Mulder et al., 2016; Iung et al., 2017). It is to be expected that technical developments in the (near) future will allow investigation of deviations based on repeated observations on the same individual for all livestock species.

Care has to be taken to estimate deviations properly: less resilient animals will have more fluctuations in production, but as a consequence the expected production (based on own observations) will be lower. This means that deviations will be lower if simply an average expected production is used, and resilience of the animal is overestimated. For example, Elgersma et al. (2018) concluded that the raw phenotypic variance of daily milk yield is a composite measure of the residual variance and the shape of the lactation curve (Elgersma et al., 2018). They suggested to model the individual's lactation curve and use variance of deviations from the lactation curve as a better resilience indicator (Elgersma et al., 2018), as was done by Codrea et al. (2011). However, an individual permanently affected by disturbances (or a disturbance) will have a lactation curve lower than its true potential without (any) disturbances. As a consequence, deviations based on the individual's lactation curve will be lower than when compared to the individual's potential lactation curve. Modeling lactation curves based on the observed data might absorb part of the variance in deviations. Instead, deviations might be based on the difference between observed production and an individual's potential production, e.g., based on its (G)EBV. Determining the expected production of an animal is one of the major challenges for unbiased estimation of the proposed resilience indicators, and requires further investigation in the near future.

To properly estimate general resilience of individuals, two factors are essential: the length of the observation period and the frequency of the observations. First, the length of the observation period should be sufficiently long to allow different types of disturbances to occur, as was suggested by Mulder et al. (2013): in stressful environments more genetic variation for micro-environmental sensitivity can be observed compared to less stressful environments (Mulder et al., 2013). A potential risk is that results are not reproducible between or even within an environment: for instance, resilience based on disturbances caused by one particular type (e.g., diseases) does not reflect resilience toward other types of disturbance (e.g., heat), and therefore does not represent general resilience. Second, the frequency of observations should be sufficiently high to capture deviations caused by disturbances. Elgersma et al. (2018) suggested that earlier studies might have underestimated the potential of resilience indicators due to too long test-day intervals, e.g., monthly milk yield records. Also in case family deviations are used, there is a substantial risk that this does not capture any disturbance (in case no or small deviations were present), let alone a diverse set of disturbance types to estimate general resilience. In contrast, too small time periods may invoke larger noise, and it may therefore become harder to find the relevant information for resilience. The time period depends on the trait measured. Future research should focus on finding the optimal time period for determining deviations by looking at the accuracy of EBV, and the genetic correlations between the resilience indicator(s) and the existing health traits. Regardless of that, resilience is ideally estimated in an environment that has all types of disturbances for the whole production period with frequent measurements.

Recent technological developments allow a tremendous increase in number of observations and number of observed phenotypes: big/longitudinal data and new phenotypes will result in more and new data on individuals to more accurately estimate deviations and consequently genetic parameters. Currently, many (breeding) organizations make use of routine data collection. Automatic milking systems (AMS) and automatic feeding systems (AFS) for cattle and pigs are the most well-known and well-developed examples. Other developments currently under investigation are for example automatic weighing systems and automatic recording of egg production in group housing. Collection of more data and new phenotypes are expected to increase in the near future, which will aid in (better) estimation of genetic variation of deviations: higher heritabilities are generally found with more observations (i.e., individual measurements) per individual or family (Hill and Mulder, 2010; Elgersma et al., 2018), suggesting that large datasets are required to accurately estimate genetic variance. In addition, even though heritability estimates of variance of deviations based on single records are often very low (around 0.01; see Hill and Mulder, 2010; Elgersma et al., 2018 for examples), the heritability of residual variance of many of these observations of one animal is actually moderate (Damgaard et al., 2003; Kapell et al., 2011; Sell-Kubiak et al., 2015; Mulder et al., 2016; Elgersma et al., 2018). **Figure 2** shows the relationship between the heritability of the residual variance based on multiple individual records and the heritability of the residual variance based on one individual record. In summary, recent technological developments, such as AMS and AFS, allow or will allow collection of large datasets with more observations per animal (i.e., big data). This will greatly facilitate the use of resilience indicators in breeding programs of all livestock species.

## THE IMPORTANCE OF RESILIENCE IN THE BREEDING GOAL

Determining the breeding goal is one of the most important elements of animal breeding. The breeding goal and the corresponding selection index determine the direction in which genetic improvement should take place. In all species, breeding goals have moved from primarily production-driven breeding goals to balanced breeding goals that aim for simultaneous improvement of production, efficiency, and health and functional traits (Olesen et al., 2003; Knap, 2009a; Neeteson-Van Nieuwenhoven et al., 2013; Miglior et al., 2017). Genetic improvement of resilience fits within the philosophy of balanced breeding, but, as far as we know, resilience is not (yet) included in breeding goals of livestock. If resilience has an impact on farm profit it should be in the breeding goal. In other words, if the economic value of resilience is nonzero, it should be a breeding goal trait. The question is, however, how can we determine the economic value for resilience.

To determine the economic value of resilience, we can consider the costs of a lack of resilience, for example higher production losses, (labor) costs of health treatments, veterinary

costs, and labor costs of the farmer for observing animals that show signs of lack of resilience. When determining the economic value of traits, it is important to avoid double counting. If resilience is defined as fluctuations in production, care needs to be taken to avoid double counting. For example, production losses (i.e., deviations due to a lack of resilience) might already be captured by the production traits, especially in dairy cows. Furthermore, costs of health treatments might already be accounted for in the breeding goal, for instance in the case of mastitis in dairy cattle. Treatment costs of mastitis, or costs of discarded milk should not be accounted for in resilience. On the other hand, production losses and costs of diseases are not always included in the breeding goal of, for instance, pigs and poultry: there is a lack of health traits in these breeding goals, and observed production losses in commercial or crossbred environments due to disturbances may not be observed in the high health selection environment. Therefore, the economic value of resilience in pigs and poultry may include production losses in the commercial environment and health costs. Anyway, the economic value can, for example, be based on labor costs for observing animals that show signs of disease or other problems, e.g., alerts or visual signs. These costs are often overlooked, because it is considered to be part of dayto-day management. However, if the number of animals per farm employee is increasing and labor time is restricted, this is clearly associated with the farmer's requirement for healthy and easy-to-manage animals. Genetic improvement in resilience would reduce labor requirements and would allow the farmer to keep more animals. Resilience should therefore be included in breeding goals.

For sustainable animal breeding, environmental and societal concerns have to be taken into account, in addition to economic concerns (Olesen et al., 2000; Nielsen and Amer, 2007). In other words, in addition to an economic value for resilience, a noneconomic value might be present. Non-economic values can require extensive work to determine (e.g., Nielsen et al., 2011; Grimsrud et al., 2013), and could for example be based on improved health and welfare of animals, and job satisfaction of farmers. However, for the sake of merely illustrating resilience as a concept, we will not take non-economic values into account in this paper.

Next, we will use the example of labor costs to show the potential of including resilience in breeding goals. Additional labor costs for (lower) resilience are related to the probability that the animal generates an alert. An alert is a warning that might indicate that an animal is influenced by a disturbance. An alert can be generated either by visual inspection, or by sensors, AMS or AFS. We assume that alerts are generated when a trait (with a normal distribution and an individual-specific variance) exceeds a fixed threshold value that is based on the population variance (e.g., a threshold that belongs to the population-wide 1% probability). This trait could be for instance milk yield or body weight, but likely not the previously proposed resilience indicators. The proposed resilience indicators are expected to detect problems too late, and we propose to use them to breed for more resilient animals rather than to use them as predictors of alerts. Breeding for more resilient animals will result in offspring with a smaller variance in their sensor values than the current generation, and thus a smaller probability of generating an alert (**Figure 3**).

We will illustrate the example of using labor costs to derive economic values in two situations: in case labor time is unrestricted, and, more realistically, in case labor time is restricted per farm (i.e., maximal number of animals given farm conditions). In addition, we will show how a combined economic value for resilience can be determined based on multiple components, in this case simplified to labor costs and (health) treatment costs. For simplicity, we choose variance as the resilience indicator for (further) illustration. We also assume that breeding for a lower variance will decrease the probability of generating alerts, and that the genetic correlation between the variance and resilience in the breeding goal is 1. We hypothesize that breeding for more resilient animals will result in offspring with an autocorrelation closer to zero, a skewness closer to zero,

and a slope closer to zero. In all cases, the offspring are expected to have a lower probability of generating an alert, because the underlying concept is the same, although it is less obvious than with a smaller variance.

#### Economic Value for Resilience Based on one Component

If labor time is unrestricted, the economic value of resilience is the change in expected number of alerts multiplied by the time per alert and the labor cost per time unit over the whole production cycle of an individual; i.e., using Equation 8b from Mulder et al. (2008), the economic value of resilience (vresilience,unrestricted labor time) is:

$$\nu\_{resillence, un restricted labor time} = 0.5 \times z \times x \times l\_a \times c\_l \times d \qquad (1)$$

where z is the ordinate of the standard normal distribution at the standardized threshold x of the alert (e.g., the threshold that belongs to the 1% probability), l<sup>a</sup> is the labor time required for dealing with the alert, c<sup>l</sup> is the labor cost per time unit, and d is the number of days of the finisher period.

If labor time is restricted per farm (i.e., the number of animals on a farm is maximized given a certain farm management), we assume that the total available time (L) is constant, and average time available per animal for normal management (ln) cannot be changed. We also assume that the average time per animal required for dealing with alerts (l<sup>r</sup> = l<sup>a</sup> × pa, with p<sup>a</sup> being the probability of obtaining an alert) can be changed by selection, i.e., selection for resilience reduces the number of alerts per animal over the whole production cycle. The total profit of a farm is:

$$\text{Profit} = n \times \text{(Revenue} - \text{Costs)}\tag{2}$$

where n is the number of animals per farm, equal to:

$$n = \frac{L}{l\_n + l\_r} \tag{3}$$

Rewriting Equation (2) using Equation (3) results in:

$$\text{Profit} = \frac{L}{l\_n + l\_r} \times \text{(Revenue} - \text{Costs)}\tag{4}$$

To obtain the economic value, the derivative of Equation 4 with respect to l<sup>r</sup> is required, being:

$$\frac{d\text{Profit}}{dl\_r} = \frac{-L}{\left(l\_n + l\_r\right)^2} \times \text{(Revenues} - \text{Costs)}\tag{5}$$

The economic value must be expressed per animal and therefore Equation 5 is divided by Equation 3 to obtain the improvement in profit when l<sup>r</sup> changes with 1 time unit:

$$\frac{dProfit}{dl\_r/annual} = \frac{-(Revenues - Costs)}{l\_n + l\_r} \tag{6}$$

In other words, Equation (6) is the change in profit when changing l<sup>r</sup> with 1 time unit, and can be interpreted as the costprice of labor spent on dealing with alerts for the total period an animal is kept, i.e., the product c<sup>l</sup> × d in Equation (1). In fact, the economic value shows the increase or decrease in farm profit due to higher or lower resilience, because more or fewer animals can be kept on the farm, if labor is restricted.

To obtain the economic value on the basis of a difference in resilience, Equation 1 (unrestricted labor time) is adjusted based on the different cost-price of restricted labor time of Equation 6. This results in:

$$\begin{aligned} \text{\(\mathcal{V}es\)} & \text{res} \, \text{\(\mathcal{V}es\)} \, \text{labor\(\mathcal{V}es\)} \\ & \quad 0.5 \times z \times \varkappa \times l\_a \times \frac{-\text{(Revenues-Costs)}}{l\_n + l\_r} \end{aligned} \tag{7}$$

To show the impact of unrestricted labor time and restricted labor time per farm, we calculate the economic values for resilience for a farm with finisher pigs. We assume 8 labor h/day, 15 currency units/h labor costs, 10 currency unit profit per animal (i.e., the economic value of growth), a 1% alert probability (i.e., x = −2.33), 5 min attention time per alert (la), and 125 days (d) to grow from 25 kg to 125 kg (i.e., average daily gain is 800 g/day). **Figure 4** shows that the economic value of resilience is constant when labor time is unrestricted. However, the economic value of resilience increases with increasing farm size when labor time is restricted per farm (**Figure 4**). The two situations lead to an equal economic value when the farm size is 1,500 finisher pigs, because in that case the income of the farmer would be 15 currency units/h, equal to the price of unrestricted labor. The economic value of resilience based on restricted labor time reaches more than 60% of the economic value of growth (in our example) with a farm size of 2,000 pigs, and would keep increasing with increasing farm size. Improving resilience of animals would thus allow more animals per farm (i.e., intensification). In fact

with restricted labor time, the time for normal management (ln) per animal decreases with an increase in number of animals and a constant amount of labor time available. Therefore, the proportion of time for alerts (lr) increases and the proportion of time for normal time (ln) decreases with farm size. As a consequence, an increase or decrease in resilience has a larger impact on profit of the farm when the farm size increases. Nevertheless, both situations show that the economic value of resilience would be negative, meaning that reducing deviations will have a beneficial effect (i.e., an increase) on farm profit and thus resilience should be included in the breeding goal.

#### Economic Value for Resilience Based on Multiple Components

In the situation above, we considered only labor costs. Next, we extend the economic value of resilience to include for instance treatment costs for a disease, in case these costs are not yet included in the breeding goal. The economic value for resilience is:

$$\nu\_{resillence} = \nu\_{labor} + \nu\_{treatment} \tag{8}$$

with vlabor being the economic value for labor costs (based either on unrestricted or restricted labor time) and vtreatments being the economic value for treatment costs, which can be defined as:

$$\nu\_{treatment} = 0.5 \times z \times \times p\_{treatment} \times \text{Cost}\_{treatment} \tag{9}$$

where z is the ordinate of the standard normal distribution at the standardized threshold x of the alert, (e.g., the threshold that belongs to the 1% probability), ptreatment is the probability of a treatment given that the animal got an alert, and Costtreatment is the cost of the treatments. Note that in this case, treatment costs must not be part of other components of the selection index to avoid double counting of these costs. This small extended example shows that different components, based on costs, can be relatively easily included in the economic value of resilience. As shown in **Figure 4**, the economic value of resilience can easily reach 60% of the economic value of growth, but these extra components can make the economic value even larger than the economic value of growth.

#### THE ADDED VALUE OF RESILIENCE IN BREEDING PROGRAMS

We will now show the added value of estimated breeding values for resilience to breed healthy and easy-to-manage animals in two livestock species: (1) a pig scenario, and (2) a dairy cattle scenario. In both cases, the selection indices will be simplified in order to draw general conclusions. Selection is based on truncation selection on the index (I) to maximize response in the breeding goal (H). A simplified genomics scheme is simulated. Calculation of the responses to selection were done in SelAction v2.1 (Rutten et al., 2002), using the principle of Dekkers (2007) to include genomic information.

#### Pig Scenario

In this example, we describe a simplified pig scenario, in which individuals are only selected on growth rate and the breeding goal is extended with resilience. We explore the effect of adding a resilience indicator (e.g., variance of deviations) to the selection index and assume a genetic correlation of 1 between the resilience indicator and resilience in the breeding goal. Assumptions made are:


More information about input can be found in **Supplementary Material**.

**Figure 5** shows the selection response of growth rate and resilience. The desired direction of a breeding program aimed

at simultaneously improving growth rate and resilience, i.e., reducing the variance, is the bottom right corner of each graph. Thus, in case of an unfavorable positive genetic correlation between growth rate and resilience, and resilience indicators are not included in the selection index, no progress can be made to obtain more resilient animals (**Figure 5A**). However, including a resilience indicator into the selection index of pigs can result in a higher selection response in the breeding goal (H) and more resilient animals, depending on the chosen economic values (**Figure 5B**). For example with an economic value of −0.6 for resilience (and an economic value of 1 for growth rate), the selection response in H is improved with 14.6% when a resilience indicator is included in the selection index. Although in this case, a reduction in the selection response of growth rate is observed (−15.4%), the selection response of resilience improves with 185.3% (see red crosses in **Figures 5A,B**). Not including a resilience indicator in the selection index increases the probability to generate an alert to 1.16% (start: 1%), while including a resilience indicator in the selection index reduces the probability to generate an alert to 0.88%. This corresponds to a reduction of 24.7% in the number of alerts. In case of a favorable genetic correlation between growth rate and resilience, the increase in selection response in H is even higher when comparing a selection index without (**Figure 5C**) and with a resilience indicator (**Figure 5D**): +42.8% for an economic value of −1.6 for resilience. This simplified example shows that including resilience indicators in the selection index can have big impact on the selection response and number of alerts.

### Dairy Cattle Scenario

In this example, we describe a simplified dairy cattle scenario, though with a more complex breeding program than the pig scenario: individuals are selected on milk yield and health-related traits, which are in this case longevity and udder health. We explore the effect of adding a resilience indicator (e.g., variance of deviations) to the selection index, i.e., selecting on a resilience indicator. Assumptions made are:

• Resilience has a heritability of 0.10 (Elgersma et al., 2018). The genetic correlation between resilience and milk yield is 0.61, between resilience and longevity is −0.30, and between resilience and udder health is −0.36. In other words, a higher resilience (i.e., a lower variance of deviations) is genetically correlated with a lower milk yield, a higher longevity, and a better udder health. These estimates were obtained from Elgersma et al. (2018) and CRV (2015).


More information about input can be found in **Supplementary Material**.

Similar to the pig scenario, including resilience into the breeding goal of dairy cattle can result in a higher selection response in H compared to not including resilience into the breeding goal, depending on the chosen economic values (**Figure 6**), even though health-related traits (i.e., longevity and udder health) are already included in the breeding goal and selection index. If resilience, being the variance in this case, has an economic value of −0.2, the selection response in H increases with 3.0% (see black cross in **Figure 6**): including a resilience indicator in the selection index compensates the loss in milk yield (6.3%) by an improvement in longevity (1.4%), udder health (1.0%), and resilience (−102.6%) (**Table 2**). Not including a resilience indicator in the selection index reduces the probability to generate an alert to 0.92% (start: 1%), while including a resilience indicator in the selection index reduces the probability to generate an alert to 0.84%. This corresponds to a reduction of 8.4% in the number of alerts (see red cross in **Figure 6**). This simplified example shows that resilience indicators can have beneficial impact. However, the effect is smaller in the dairy cattle scenario compared to the pig scenario, because of the presence of health-related traits in the selection index.

#### Perspectives and Other Livestock Species

The potential of resilience in breeding goals was clearly illustrated in the two scenarios. Obviously these scenarios overestimate the impact of resilience indicators, because of their simplification. Nevertheless, the underlying idea holds, because a reduction of time spent on an animal with an alert (for any reason) will reduce costs and consequentially increase farm profit. The pig scenario was based on only two traits, but might still be fairly close to a pig sire line scenario. Sire line breeding programs are primarily focused on improvement of production traits, in contrast to dam line breeding programs, which are more focused on improvement of reproduction and maternal traits.

the probability an animal generates an alert in a default selection index without and with inclusion of a resilience indicator for various economic values in a dairy cattle breeding program. The default selection index contains milk yield, longevity, and udder health. The crosses are discussed in the text and shown in detail in Table 2.

TABLE 2 | Selection responses (in trait units) and alert probability (%) and their relative changes (in %) in a selection index without and with inclusion of a resilience indicator (economic value = −0.2) in a dairy cattle breeding program.


*The default selection index contains milk yield, longevity, and udder health. Selection responses are shown for milk yield, longevity, udder health, resilience, and the breeding goal (H). <sup>a</sup> In order to improve resilience of an animal, the resilience indicator trait (e.g., variation in deviations) has to be reduced. Negative selection response are therefore desired, and more negative values indicate higher resilience.*

Nevertheless, also for dam lines resilience can be included in breeding programs: multiplier farms require resilient sows and resilient piglets. For instance, litter size uniformity has a beneficial effect on piglet resilience based on survival (Damgaard et al., 2003; Mulder et al., 2015b). Favorable correlations were found between the residual variance of feed intake and feed duration with mortality and the number of health treatments in pigs in a challenge environment (Putz et al., 2018). This shows that the residual variance of feed intake and feed duration can be used to improve resilience. Dairy cattle breeding programs contain health(-related) traits, which are expected to partly cover resilience indicators. Indeed, several favorable r<sup>g</sup> between health traits and variance in deviations of milk yield were found, but none of them is equal to 1 indicating that the raw variance of milk yield contains new information about resilience and health (Elgersma et al., 2018). This was also shown in the dairy cattle scenario, which showed considerable improvement in response to a selection index with resilience. We propose that inclusion of resilience to selection indices of livestock breeding programs (similar to pigs and dairy cattle) will strongly increase the response in resilience of livestock, and likely increase the selection response in H as well.

We did not investigate resilience in breeding programs of other livestock, such as poultry, extensively kept livestock species (e.g., beef cattle and sheep), or aquaculture species, because resilience is more difficult to assess and apply in practice at this moment. In the first place, because repeated measurements are difficult to collect. This is mainly due to the impossibility to measure animals individually, due to group housing or difficulties catching individuals. In the second place, labor time spent on alerts is less relevant. Alerts are created for some livestock species based on group measurements (e.g., water intake), but not at the individual level. Also these type of alerts represent epidemic disturbances, rather than day-to-day endemic disturbances. However, resilience indicators based on a relatively limited set of production data (both frequency of repeated observations and number of animals) can already provide valuable information on health of animals (e.g., 4-weekly body weight deviations; Berghof et al., in preparation), which is currently not incorporated into the selection indices. More importantly, the development of new techniques in the near future will allow collection of daily observations on an individual level, such as individual laying nests, individual measurements of fish (without capturing), automatic collection of data, and individual tracking and measurements with camera or drones. This might result in collection of big data and the definition of new phenotypes, and can eventually result in the use of resilience indicators in breeding programs for all livestock species.

#### CONCLUSION

This paper shows that including resilience in breeding programs has great potential to obtain healthy and easy-to-manage livestock. Resilience indicators can be based on deviations between observed production and expected production. Of

#### REFERENCES


particular interest are variance of deviations, autocorrelation of deviations, and skewness of deviations. Also the slope of the reaction norm might contain information, though limited to macro-environmental disturbances. An economic value for resilience indicators in the selection index can be determined based on reduced labor costs and health costs, provided that these costs are not accounted for in other traits in the selection indices. For most farms, where labor time is restricted, the economic value of resilience increases with an increasing number of animals per farm. This paper also shows the additional benefit of including resilience in the breeding goal: in both the pig and dairy cattle scenarios, we show improvements in the selection response in the breeding goals and in particular the improvement of resilience by including resilience in the breeding goal. The rapid technological development on massive collection of data (i.e., big data) is only expected to increase in the near future, resulting in more data available. The accompanying possibilities to utilize these data to determine resilience indicators, will greatly facilitate breeding for improved resilience in all livestock species.

### AUTHOR CONTRIBUTIONS

TB and HM conceived the project. TB, MP, and HM developed the ideas about measuring resilience. HM derived the economic value of resilience, and set up the breeding scenarios. TB performed the simulations in SelAction. TB wrote the (first draft of) the paper. MP and HM contributed to writing the paper.

#### FUNDING

This research was financially supported by Netherlands Organization for Scientific Research Earth and Life Sciences (NWO-ALW; project ALWSA.2016.4), and by the Dutch Ministry of Economic Affairs (TKI Agri & Food project 16022) and the Breed4Food partners Cobb Europe, CRV, Hendrix Genetics and Topigs Norsvin.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2018.00692/full#supplementary-material

animal response characteristics: an example using milk yield profiles in dairy cows. J. Anim. Sci. 89, 3089–3098. doi: 10.2527/jas.2010-3753


infection pressure. J. Anim. Sci. 78, 2809–2820. doi: 10.2527/2000.7811 2809x

Westneat, D. F., Wright, J., and Dingemanse, N. J. (2015). The biology hidden inside residual within-individual phenotypic variation. Biol. Rev. Camb. Philos. Soc. 90, 729–743. doi: 10.1111/brv.12131

**Conflict of Interest Statement:** This research was partly financed by the Breed4Food partners Cobb Europe, CRV, Hendrix Genetics, and Topigs Norsvin. Except for the financial contribution, no other shared interests (e.g., employment, consultancy, patents, products) exist between the Breed4Food partners and the authors. This paper was neither discussed nor reviewed by any of the Breed4Food partners.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Berghof, Poppe and Mulder. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Assessment of Autozygosity Derived From Runs of Homozygosity in Jinhua Pigs Disclosed by Sequencing Data

Zhong Xu<sup>1</sup> , Hao Sun<sup>1</sup> , Zhe Zhang<sup>1</sup> , Qingbo Zhao<sup>1</sup> , Babatunde Shittu Olasege<sup>1</sup> , Qiumeng Li<sup>1</sup> , Yang Yue<sup>1</sup> , Peipei Ma<sup>1</sup> , Xiangzhe Zhang<sup>1</sup> , Qishan Wang<sup>1</sup> \* and Yuchun Pan1,2 \*

<sup>1</sup> Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China, <sup>2</sup> Shanghai Key Laboratory of Veterinary Biotechnology, Shanghai, China

Jinhua pig, a well-known Chinese indigenous breed, has evolved as a pig breed with excellent meat quality, greater disease resistance, and higher prolificacy. The reduction in the number of Jinhua pigs over the past years has raised concerns about inbreeding. Runs of homozygosity (ROH) along the genome have been applied to quantify individual autozygosity to improve the understanding of inbreeding depression and identify genes associated with traits of interest. Here, we investigated the occurrence and distribution of ROH using next-generation sequencing data to characterize autozygosity in 202 Jinhua pigs, as well as to identify the genomic regions with high ROH frequencies within individuals. The average inbreeding coefficient, based on ROH longer than 1 Mb, was 0.168 ± 0.052. In total, 18,690 ROH were identified in all individuals, among which shorter segments (1–5 Mb) predominated. Individual ROH autosome coverage ranged from 5.32 to 29.14% in the Jinhua population. On average, approximately 16.8% of the whole genome was covered by ROH segments, with the lowest coverage on SSC11 and the highest coverage on SSC17. A total of 824 SNPs (about 0.5%) and 11 ROH island regions were identified (occurring in over 45% of the samples). Genes associated with reproduction (HOXA3, HOXA7, HOXA10, and HOXA11), meat quality (MYOD1, LPIN3, and CTNNBL1), appetite (NUCB2) and disease resistance traits (MUC4, MUC13, MUC20, LMLN, ITGB5, HEG1, SLC12A8, and MYLK) were identified in ROH islands. Moreover, several quantitative trait loci for ham weight and ham fat thickness were detected. Genes in ROH islands suggested, at least partially, a selection for economic traits and environmental adaptation, and should be subject of future investigation. These findings contribute to the understanding of the effects of environmental and artificial selection in shaping the distribution of functional variants in the pig genome.

Keywords: pig, runs of homozygosity, inbreeding coefficients, selection, animal breeding

#### Edited by:

Andrea B. Doeschl-Wilson, Roslin Institute, University of Edinburgh, United Kingdom

#### Reviewed by:

Maria Saura, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Spain Silvia Teresa Rodriguez Ramilo, Institut National de la Recherche Agronomique (INRA), France

#### \*Correspondence:

Qishan Wang wangqishan@sjtu.edu.cn Yuchun Pan panyuchun1963@aliyun.com

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 20 June 2018 Accepted: 12 March 2019 Published: 28 March 2019

#### Citation:

Xu Z, Sun H, Zhang Z, Zhao Q, Olasege BS, Li Q, Yue Y, Ma P, Zhang X, Wang Q and Pan Y (2019) Assessment of Autozygosity Derived From Runs of Homozygosity in Jinhua Pigs Disclosed by Sequencing Data. Front. Genet. 10:274. doi: 10.3389/fgene.2019.00274

## INTRODUCTION

fgene-10-00274 March 26, 2019 Time: 16:12 # 2

Autozygosity refers to homozygosity in which the two alleles are identical by descent (IBD). It can result from several different phenomena, such as genetic drift, consanguineous matings, population bottleneck, as well as natural and artificial selection (Curik et al., 2014). Jinhua pig, as a valuable natural resource, is a well-known indigenous breed in eastern China that has evolved as a pig breed with excellent meat quality, greater disease resistance, higher prolificacy and greater adaptability to hot and humid climate (Gao et al., 2014). Due to their superior meat quality, Jinhua pigs have been used for the production of a famous ham brand called Jinhua Ham, which is a famous ham in China (Miao et al., 2009). The number of Jinhua pigs has been decreasing in the last two decades as a result of large import of Western pig breeds to improve leanness rate of pork (China National Commission of Animal Genetic Resources, 2011). A deficient control of inbreeding may lead to a reduction of the genetic variability and therefore of the effective population size (Ne), a key parameter that influences the conservation planning and determines the rate of change in the composition of a population caused by genetic drift (Charlesworth, 2009). In addition, inbreeding may also increase the frequency of autozygosity for deleterious alleles with the consequent reduction in individual performance (Ouborg et al., 2010). For these reasons, there is a growing interest in characterizing and understanding inbreeding and autozygosity in Jinhua pigs. This would help to better preserve the genetic diversity and allow long-term viability of breeding programs of this breed.

Runs of homozygosity (ROH) are contiguous homozygous segments of the genome where the two haplotypes inherited from the parents are identical (Gibson et al., 2006). The development of high-density single nucleotide polymorphism (SNP) markers to scan the genome for ROH has been proposed as a proxy for the detection of genomic regions where a reduction in heterozygosity has occurred (Howrigan et al., 2011). Nowadays, whole genome inbreeding estimated from ROH is considered as a powerful method to distinguish between recent and ancient inbreeding (Keller et al., 2011). As the expected length of a ROH is equal to 1/2G Morgan, where G is the number of generations since the common ancestor (Thompson, 2013), the number of generations can be inferred from the length and frequency of ROH (Howrigan et al., 2011). The autozygosity, based on ROH, can help to improve the understanding of inbreeding depression of a trait (Keller et al., 2011) and also help to identify genes associated with traits of economic interest present in these ROH island regions (Purfield et al., 2017). In addition, given the stochastic nature of recombination, the occurrence of ROH is not randomly distributed across the genome, and ROH islands across a large number of samples may be the result of selective pressure (Zavarez et al., 2015). Recently, ROH has been used to explore signatures of selection in cattle (Peripolli et al., 2018), chicken (Marchesi et al., 2018), and sheep (Mastrangelo et al., 2017), but less commonly in pig, especially Chinese indigenous pigs such as the Jinhua pigs.

In this study, we investigated the occurrence and distribution of ROH in a sample of 202 Jinhua pigs, in order to characterize genome-wide autozygosity levels and to detect potential ROH islands that may provide insights into past events of selection in this population. In addition, other parameters to address the levels of genetic variability, including N<sup>e</sup> and different measures of inbreeding from pedigree and genomic information, were also investigated. For that, we used genotyping by genome reducing and sequencing (GGRS) (Chen et al., 2013), which was successfully applied to evaluate genetic diversity in Chinese indigenous pig breeds in the Taihu region (Wang et al., 2015).

#### MATERIALS AND METHODS

#### Population and Sequencing

Ear tissue samples were collected from 202 Jinhua pigs (189 females and 13 males) from conservation pig farms in Zhejiang province. Those pigs were born between 2014 and 2017, with an average depth of about four generations. A commercial kit (Lifefeng Biotech, Co., Ltd., Shanghai, China) was used to extract genomic DNA, and verified the integrity and purity of DNA by agarose gel electrophoresis and the A260/280 ratio. The Genomic DNA samples were genotyped using the GGRS protocol (Chen et al., 2013). Quality control (QC) of ∼1.4 billion raw reads were performed using NGS QC Toolkit v2.3 (Patel and Jain, 2012). In this study, we mapped the clean sequencing reads to the latest released pig reference genome (Sscrofa11.1) using BWA (Li and Durbin, 2010). SNP calling was performed using SAMTOOLS v0.1.19 and the missing genotypes were imputed using BEAGLE (Howie et al., 2009; Li et al., 2009). Additional quality controls were applied following Xiao et al. (2017). These included a minimum number of samples genotyped > 30%, a calling quality > 20 (99% accuracy), and a minor allele frequency (MAF) ≥ 5%. SNPs mapped to sex chromosomes were excluded from the analyses.

To determine novel variants in our sequence data, we compared the identified SNPs with the dbSNP data (Build 152<sup>1</sup> ). These SNPs were annotated according to the Ensembl pig gene annotation set (Ensembl release 92<sup>2</sup> ) as previously reported by Wang et al. (2015).

#### Genetic Diversity

Observed (Ho) and expected heterozygosity (He) were estimated using PLINK v1.07 (Purcell et al., 2007). The N<sup>e</sup> was estimated using SNeP v1.1 (Barbato et al., 2015). This approach simultaneously estimated historical effective population size based on the relationship between LD, Ne, and recombination rate:

$$N\_{\mathfrak{e}(t)} = (4f(c\_t))^{-1} (E[r\_{adj}^2 | c\_t|^{-1} - \alpha) \tag{1}$$

where Ne(t) is the effective population size t generations ago, calculated as t = (2f (ct))−<sup>1</sup> ; c<sup>t</sup> is the recombination rate for a specific physical distance between SNPs, which was estimated using Sved and Feldman (1973); f is the Haldane mapping function built between recombination rate and genetic distance

<sup>1</sup> ftp://ftp.ncbi.nlm.nih.gov/snp

<sup>2</sup> ftp://ftp.ensembl.org/pub/release-92/gtf/sus\_scrofa/

measured by Morgan; r 2 adj is the LD value corrected for sample size and α is a correction for the occurrence of mutations.

Linkage disequilibrium between SNP pairs was estimated using PLINK v1.07 (Purcell et al., 2007). Haplotype blocks were obtained with a confidence intervals algorithm and with the software Haploview (Barrett, 2009), which was also used to visualize haplotype patterns.

#### Measure of Runs of Homozygosity

Runs of homozygosity were identified for each individual using PLINK v1.07 (Purcell et al., 2007). The default parameter – homozyg were used to define ROH (Peripolli et al., 2018) and the following criteria were chosen: (1) a sliding window of 50 SNPs across the genome; (2) one heterozygous and five missing calls were allowed per window to account for genotyping error; (3) the minimum number of consecutive SNPs included in a ROH was set to 100; (4) to exclude short ROH that was derived from strong LD, the minimum length for a ROH was set to 1 Mb (Purfield et al., 2012); (5) the required minimum SNP density to define a ROH was 1 SNP per 50 kb. Considering an approximate genetic distance of 1 cM each 1 Mb (Zanella et al., 2016), a minimum ROH length of 1 Mb was expected to capture inbreeding up to 50 ancestral generations.

#### Pedigree and Genomic Inbreeding Coefficients

Different types of inbreeding coefficients were estimated based on pedigree and genomic information. Pedigree-based inbreeding coefficients (FPED) for all pigs were estimated using R package "pedigree."

Genomic inbreeding for each animal was estimated from ROH (FROH), as the ratio of the total length of genome covered by ROH to the total length of the genome covered by SNPs or sequences, as proposed by McQuillan et al. (2008):

$$\text{F}\_{\text{ROH}} = \frac{L\_{\text{ROH}}}{L\_{\text{auto}}},\tag{2}$$

in which LROH is the total length of an individual's ROH and Lauto is the length of the autosomal genome covered by the SNPs, which was 2.26 Gb in our study. For each animal, four ROH estimates were calculated based on lengths from sequence data as the proportion of its genome: ROH > 10 Mb (FROH <sup>&</sup>gt; <sup>10</sup> Mb), 5–10 Mb (FROH5−<sup>10</sup> Mb), 1–5 Mb (FROH1−<sup>5</sup> Mb), and ROH > 1 Mb (FROH\_all), corresponding to 5 generations, 5 to 10 generations, 10 to 50 generations, and 50 generations, respectively.

In addition, three SNP-based estimates of inbreeding coefficients were calculated using the option –ibc from the GCTA software (Yang et al., 2011): the first estimator, FSNP1, was based on the variance of the additive genotypes (VanRaden, 2008); the FSNP2 estimate was calculated based on the homozygous excess; the third estimator, FSNP3, was calculated based on the correlation between uniting gametes (Wright, 1922). The formulae are as follows:

$$F\_{\text{SNP}1} = \frac{1}{n} \sum\_{i=1}^{n} \frac{(Y\_i - 2p\_i)^2}{h\_i} - 1,\tag{3}$$

$$F\_{\text{SNP2}} = 1 - \frac{1}{n} \sum\_{i=1}^{n} \frac{Y\_i(2 - Y\_i)}{h\_i},\tag{4}$$

$$F\_{\text{SNP3}} = \frac{1}{n} \sum\_{i=1}^{n} \frac{Y\_I^2 - Y\_i(1 + 2p\_i) + 2p\_i^2}{h\_i} \tag{5}$$

Where Y<sup>i</sup> is the number of the reference allele copies for the i-th SNP, p<sup>i</sup> is the frequency of this allele in the sample and h<sup>i</sup> = 2p<sup>i</sup> (1-pi), and n is the total number of SNPs. Note that these coefficients were corrected by the allele frequencies of the current population and they can take negative values (Yang et al., 2013), while FPED and FROH ranged from 0 to 1. The inbreeding coefficients obtained by the eight methods were compared using Pearson's correlation.

#### Detection of Common Runs of Homozygosity

Genomic regions with reduced genetic diversity can be found in ROH islands, and high homozygosity around the ROH islands that might harbor targets of positive selection and are under strong selective pressure (Pemberton et al., 2012). To identify the genomic overlapping ROH regions, we calculated the proportion of the occurrences of a SNP in ROH by counting the number of times the SNP was detected in those ROH across individuals, and this was plotted against the position of the SNP along the chromosome. The genomic regions most commonly associated with ROH were identified by selecting the top 0.5% of the SNPs most commonly observed in ROH (Pemberton et al., 2012). A series of adjacent SNPs, merged to constitute ROH islands and genes within each ROH island, were further extracted using the BIOMART package (Durinck et al., 2005). To further analyze the functions of identified genes, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment analyses were performed using DAVID 6.8<sup>3</sup> . Only terms with a p-value less than 0.05 were considered as significant and listed.

#### Detection of Selection Signatures Within Jinhua Pigs

To compare the selection signatures obtained from ROH, the integrated haplotype score (iHS) test was performed within Jinhua pigs. The iHS is a measure of the amount of extended haplotype homozygosity at a given SNP, designed to use phased genotypes to identify putative regions of recent or ongoing positive selection in genomes (Voight et al., 2006). The haplotype was phased using fastPHASE with default parameters (Scheet and Stephens, 2006). The derived haplotypes were then analyzed using the rehh v2.0 R package (Gautier et al., 2017) as previously reported by Bertolini et al. (2018). The iHS score was computed for each autosomal SNP, and values obtained were standardized so that they followed a standard normal distribution. To calculate the p-value at the genomic level, the scores for each SNP were transformed as piHS = − log<sup>10</sup> [1 − 2|8 (iHS) − 0.5|], where

<sup>3</sup>https://david.ncifcrf.gov/

8(x) represents the Gaussian cumulative distribution function and piHS is the two sided p-value associated with the neutral hypothesis of no selection (Gautier et al., 2017). Corresponding to the threshold of 0.5% for ROH islands, the | iHS| scores higher than 2.81 (p < 0.005) were considered as putative signatures of selection (Cardoso et al., 2018). In this study, significant iHS signals were reported only for ROH islands.

#### RESULTS AND DISCUSSION

A total of 166,661 informative SNPs satisfying the quality filters were obtained, 26,458 of which were identified as unreported in the pig SNP database of NCBI. The SNP density was about 1 SNP per 13.6 kb and they were equally distributed on each chromosome, with the exception of some isolated regions (**Figure 1**). According to the Ensembl pig gene annotation set (Ensembl release 92), 81,753 SNPs were mapped to gene regions, of which 8,092 were mapped as exons, and 4,481 were mapped as UTRs.

#### Genetic Diversity

The average observed (Ho) and expected (He) heterozygosities in this Jinhua pig population were 0.312 ± 0.070 and 0.429 ± 0.057, respectively. The value of H<sup>e</sup> was slightly higher than that reported by Chen et al. (2018) for Jinhua pig breed in Zhejiang province. The earliest legend regarding the Jinhua pigs may be traced back to approximately 1600 years ago (China National Commission of Animal Genetic Resources, 2011). Considering a generation interval of 1.5 years, the 1000 generations correspond to Jinhua populations 1500 years ago, approximately. The N<sup>e</sup> was estimated from five to 1000 generations ago in our study. The results show that N<sup>e</sup> has decreased through time, at a faster rate at 1,000 to 970 generations ago (**Supplementary Figure S1**). This finding could be explained by the domestication bottleneck caused by human-driven artificial selection approximately 1500 years ago. The effective population size in the last five generations was about 88 and was about 3018 in the 1000th generation. This value was larger than that reported by Xiao et al. (2017) for pig breeds in the Taihu region of China (ranging from 47 to 71) using the same method, therefore, this may indicate higher genetic diversity in Jinhua pigs.

## Pedigree and Genomic Inbreeding Coefficients Estimate

The average inbreeding coefficients estimated using different approaches are shown in **Table 1**. Incomplete pedigree failed to capture the influence of relatedness among founders from the base population, thus, the levels of inbreeding based on pedigree were expected to be lower than levels of inbreeding based on ROH and SNP-by-SNP (**Table 1**). The average FROH based on larger segments (0.041–0.053) was closer to FPED (0.01) than the average FSNP. This was to some extent expected, given that the pedigree depth (about four generations) is in agreement with larger ROH segments. In addition, these two coefficients vary in the same range (0–1), while SNP-by-SNP coefficients estimated here can take lower or larger values. These low average FROH values of inbreeding suggest that recent inbreeding was low. However, according to SNP-by-SNP based coefficients, which reflect deviations of the observed inbreeding from the expected values in the current population, recent inbreeding seems to be considerably higher (0.262) than that based on pedigree and ROH.



Min, minimum; Max, maximum; SD, standard deviation; N, the numbers of animals.

In contrast, the correlations between FPED with all the genomic coefficients was low (from −0.009 to 0.053, **Table 2**), which may be indicative of a lack of power of FPED to determine relatedness among founders from the base population (Visscher et al., 2006). These low correlations may also be affected by a poor and incomplete pedigree recording, as the base population assumed for FPED, based on long ROH and FSNP was within the range of the last four generations. Correlations between genomic coefficients, based in ROH and in SNP-by-SNP approaches were considerably higher (from 0.218 to 0.698), increasing between coefficients computed from the same source of information, as expected (i.e., ROH segments or SNP-by-SNP information), thus suggesting that genotype-based estimates provide greater accuracy on relatedness as supported by previous studies (Purfield et al., 2012; Zanella et al., 2016).

Among the four inbreeding coefficients based on different ROH lengths, FROH\_all (FROH <sup>&</sup>gt; <sup>1</sup> Mb) had higher correlations with FSNP1, FSNP2, and FSNP3. A similar trend was also reported by Purfield et al. (2017) while studying six commercial meat sheep breeds. Among the three inbreeding coefficients based on SNP-by-SNP, FSNP2 had higher correlations with FROH1−<sup>5</sup> Mb, FROH5−<sup>10</sup> Mb, FROH <sup>&</sup>gt; <sup>10</sup>Mb and all FROH\_all. These results corroborate previous results observed in cattle (Zhang et al., 2015a; Mastrangelo et al., 2016; Purfield et al., 2017). Similarly, Zhang et al. (2015a) also found that FSNP2 based on excess of homozygosity correlated relatively highly with FROH detected from 50k and sequence data. This trend may be due to the fact that both FROH and FSNP2 directly reflect homozygosity on the genome (Brito et al., 2017). FSNP2 (to some extent) capture all of the homozygosity, whereas, the FROH uses only ROH. Furthermore, the moderate to high correlations between FROH and the three other estimates of genomic inbreeding (FSNP1, FSNP2, and FSNP3) suggested that the proportion of the genome in ROH can be used as an accurate estimate of individual inbreeding levels (Purfield et al., 2012; Peripolli et al., 2018).

#### Genomic Distribution of Runs of Homozygosity

The abundance and genomic distribution of ROH provide efficient information about the demographic history of livestock species (Bosse et al., 2012). In total, 18,690 ROH were identified in 202 individuals. The mean ROH length was 4.11 Mb and the longest segment, found in chromosome SSC1 had 72.45 Mb (2,237 SNPs). The distribution of ROH according to length is shown in **Figure 2A**. The descriptive statistics of ROH number and length by classes is given in **Table 3**. The total ROH number for Jinhua pigs was composed mostly of a high number of shorter segments (1–5 Mb), which accounted for approximately 77% of all ROH detected, and contributed about 43% of the cumulative ROH length. In contrast, larger ROH (>10 Mb), which were only 8% of all ROH, still covered about 32% of the total ROH length. These results revealed that both ancient (up to 50 generations ago) and recent (within the last five generations) inbreeding have had an impact on the genome of the Jinhua pig population.

For individuals, the relationship between total number of ROH and total length of the genome covered by ROH showed considerable variation among animals (**Supplementary Figure S2**). Individual ROH autosome coverage ranged from 5.32% (120.52 Mb) to 29.14% (659.87 Mb) in the Jinhua population. Similar distributions were also observed in other livestock species, such as sheep (Mastrangelo et al., 2017) and cattle (Peripolli et al., 2018).

For chromosomes, the number of ROH per chromosome and the percentage of chromosomes covered by ROH are shown in **Figure 2B**. The highest number of ROH per chromosome was on SSC6 (1,672 segments), whereas the lowest was on SSC16 (485 segments). On average, approximately 16.8% of the whole genome was under ROH segments, with the lowest coverage

TABLE 2 | Correlation coefficients (lower panel) between pedigree-based inbreeding coefficients (FPED), four inbreeding coefficients based on different ROH lengths (FROH1−<sup>5</sup> Mb, FROH5−10 Mb, FROH <sup>&</sup>gt; 10 Mb, and FROH\_all) and three inbreeding coefficients based on SNP-by-SNP (FSNP1, FSNP2, and FSNP3).


∗∗Significantly different p < 0.01.

FIGURE 2 | Distribution of the runs of homozygosity (ROH). (A) Distribution of ROH in different lengths (Mb). The values of length in Mb were transformed in log10. (B) Number of ROH longer than 1 Mb per chromosome (bars) and average percentage of each chromosome covered by ROH (red line).



shown by SSC11 (14.0%) and the highest coverage of ROHs was on SSC17 (24.1%).

#### Detection of Runs of Homozygosity Islands

Twenty seven percent of SNPs were comprised in ROH in at least 20% of individuals, thus suggesting that candidate autozygosity regions are present in this population. This finding was similar to that reported by Ferencakovic et al. (2013). The most frequent SNP detected in ROH (131 occurrences, 64.9%) mapped at ∼36 Mb in SSC3, according to the updated reference genome (Sscrofa11.1), although no genes have been currently mapped in this position, suggesting regulatory regions may be involved.

To identify the genomic regions that were most commonly associated with ROH in all individuals, the top 0.5% of SNPs with the highest occurrences (occurring in over 45% of the samples) in a ROH were considered as candidate SNPs (**Figure 3**). A total of 11 ROH island regions were identified, and the length of these regions ranged from 90 bp on SSC10 to 3.62 Mb on SSC3 (**Table 4**). The SNPs within these regions showed significantly higher linkage disequilibrium levels than the estimates obtained for the entire chromosome (**Supplementary Table S1**). On SSC8, we found the longest ROH cold-spot of 194 contiguous SNPs (12.56 Mb) that were not part of a ROH region in any of the individuals, thus suggesting a high heterozygosity region. This region might be produced by high recombination rates, or harboring loci with heterozygous advantage and under selection favoring high haplotype diversity. These results are in good agreement with low LD levels determined in these regions (**Supplementary Figure S3**) (Barrett, 2009). In the same way, ROH islands showed high levels of LD, as expected (**Supplementary Figure S4**).

## Candidate Genes Within Runs of Homozygosity Islands

Chromosome position, start and end position of ROH, ROH length, number of SNPs, and number of genes within the genomic regions of extended homozygosity are reported in **Table 4**. We found that some SNPs in ROH occurred in poor gene content regions. Some identified regions, such as that on SSC15, contained only one annotated gene, although it is longer than 1.3 Mb, either because the annotation of pig reference genome is still incomplete, or the genomic region is positioned in a non-coding region. A total number of 105 genes inside the ROH islands were analyzed using GO enrichment analysis. **Supplementary Table S2** provides the chromosome position, start and end, gene name and Ensembl Gene ID for 105 genes. **Supplementary Table S3** shows the significant GO terms and KEGG pathways; most of the genes were involved in metabolic pathways and biosynthetic process.

We also checked if the ROH islands overlapped with putative selection signatures in pigs in other literatures. We found that a ROH island at SSC7: 100881377–100912691 overlapped with gene ADCK1, involved in phosphate metabolism, which was in


CHR, chromosome; SNPs, number of SNPs in each genomic region; Genes, number of genes in each genomic region.

the selection signature region between Berkshire and Korean native pig breeds (Edea and Kim, 2014). A ROH island on SSC2 partially overlapped with a selection region for intramuscular fat and backfat thickness in two Duroc populations, which spanned three genes (ABCC8, MYOD1, and PIK3C2A) (Kim et al., 2015). The MYOD1 gene in this region was also detected in the selection signatures between Jinhua pig group and European breeds group (Li et al., 2016).

In this paper, we focused on some of the most relevant genes within ROH that showed associations with several specific traits related to livestock breeding. Several candidate genes relating to reproduction traits were identified, such as the HOXA genes cluster: HOXA3, HOXA7, HOXA10, and HOXA11 on SSC18, which affects embryo implantation and prolificacy traits (Bagot et al., 2000; Gao et al., 2010; Wu et al., 2013); ROPN1, involved in litter size trait in pigs (Lan et al., 2012); and HNRNPA2B1, which plays key roles in the preimplantation of pig embryo during elongation (Wilson et al., 2000). Some genes associated with specific traits related to meat quality were detected: MYOD1, which affects muscle fiber characteristics, the loin eye area and back fat thickness (Lee et al., 2012; Cepica et al., 2013); LPIN3, one member of lipin gene family associated with back-fat thickness in pigs (He et al., 2009), which are the important regulators in fattailed sheep with active lipid metabolism (Jiao et al., 2016); CTNNBL1, associated with porcine fat deposition and backfat traits (Yin et al., 2012). One gene was involved in appetite: NUCB2, which plays an important role in whole-body energy homeostasis and body weight at puberty by regulation of appetite (Lents et al., 2013). Most of genes we detected were involved in disease resistance traits: MUC4, MUC13, MUC20, LMLN, ITGB5, HEG1, SLC12A8, and MYLK on SSC13, were potential candidate genes for controlling the expression of the enterotoxigenic Escherichia coli (ETEC) with F4 fimbriae (F4ac) receptor (Huang et al., 2008; Jacobsen et al., 2010; Rampoldi et al., 2011; Fu et al., 2012; Ren et al., 2012). These genes played key roles in resistance to diarrhea by defending the attachment and adhesion of ETEC to porcine jejunal cells and in maintaining the epithelial barrier as well as immunity function (Zhou et al., 2013). Many studies have revealed that resistance to ETEC F4ac adhesion in pigs can be inherited as an autosomal recessive trait, and the pigs with homozygous

genotype were usually resistant to ETEC F4ac (Rampoldi et al., 2011). The regional climate of the Jinhua pig is mainly subtropical with a weather condition that is hot and very humid, which is capable of inducing diarrhea especially during summer (Lyutskanov, 2011). Several researches have also shown that Jinhua pigs were more resistant to ETEC F4ac (Yan et al., 2009; Gao et al., 2014). These genes on SSC13 that display autozygosity in the Jinhua pigs may be linked to selection in response to hot and humid climate, as a result of local adaptation.

The Pig quantitative trait loci (QTL) database<sup>4</sup> lists several QTL for reproduction, meat quality and immunity traits that overlapped these ROH islands (Quintanilla et al., 2011; Verardo et al., 2015; Zhang et al., 2016). In particular, some QTLs related to ham traits have already been reported: on SSC2, Cepica et al. (2013) identified significant QTL for ham weight (ID = 28220), Harmegnies et al. (2006) reported QTL for ham fat thickness (ID = 3938, ID = 3960, and ID = 3968); on SSC3, Choi et al. (2011) detected highly significant QTL for ham weight (ID = 21357), Stratil et al. (2006) reported QTL for ham meat weight (ID = 3102) and ham weight (ID = 3104).

In summary, the results show that the genomic regions that display autozygosity in the Jinhua pig breed are related to important production traits under selection, and possibly also help improve their adaptability to survive in hot and very humid environments.

#### Selection Signature Analysis

Results from the iHS test revealed that this coefficient had an average value of 0.77, with a maximum value of 4.69 on SSC7, thus indicating that the iHS values were not uniform across the genome. **Figure 4** shows the genome-wide distribution of | iHS| values. The plots suggest evidence of selective forces in different regions of the genome. To compare these two methods, the occurrence of a SNP in a ROH was correlated with the SNP | iHS| value (**Supplementary Figure S5**). Significant moderate correlations were found between the iHS selection signature method and percentage of occurrence of a SNP in a ROH (Pearson's correlation coefficient = 0.25, <0.0001). The average | iHS| values were also calculated in each ROH islands (**Table 4**). The result showed that mean | iHS| values of SNPs in each ROH islands (except one on SSC10) were higher than that across the genome (0.77) (**Table 4**).

There were 1,535 SNPs with p-value < 0.005 that harbored signatures of selection, 92 of which were found in ROH islands (**Supplementary Table S4**). A total of 42 candidate genes were found to overlap with these regions (**Supplementary Table S5**). It includes several genes mentioned above, such as MYOD1, ADCK1, LPIN3, ITGB5, NUCB2, PIK3C2A, and ABCC8. These genes obtained by the two methods should be given more consideration in further studies. The significant correlation between the iHS selection signature method and the percentage of occurrence of SNP in a ROH, in the present study and elsewhere (Zhang et al., 2015b), supports the hypothesis that the observed ROH islands are not only as a result of demography, but could also be due to selection.

#### CONCLUSION

To our knowledge, this is the first study to describe the occurrence and distribution of ROH in the genome of Jinhua pigs. Autozygosity levels varied largely in this population, which has experienced both recent and historical inbreeding events. We have shown that, despite the low to moderate inbreeding levels in most animals, there were individuals with high inbreeding coefficients, indicating the need to account for inbreeding when planning mating strategy. Several genes within ROH islands are associated with adaptive and economic traits and should

<sup>4</sup>https://www.animalgenome.org/cgi-bin/QTLdb/SS/index

be the subject of future investigation. These findings may contribute to the understanding of the effects of environmental and artificial selection in shaping the distribution of functional variants in pig genome.

#### DATA AVAILABILITY

fgene-10-00274 March 26, 2019 Time: 16:12 # 9

All BAM data were deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). 202 samples are available under the Bioproject No. PRJNA525747.

#### ETHICS STATEMENT

All experimental procedures were approved by the Institutional Animal Care and Use Committee of Shanghai Jiao Tong University, and all methods involved pigs were in accordance with the agreement of Institutional Animal Care and Use Committee of Shanghai Jiao Tong University (Contract No. 2011–0033).

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

YP, QW, and ZX designed the experiments. ZX, HS, QZ, XZ, QL, and YY performed the experiments. ZX, ZZ, and PM developed some of the analysis software. ZX wrote the manuscript with the help of BO. All authors read and approved the final manuscript.

#### FUNDING

This work was supported by Zhejiang Province agriculture (livestock) varieties breeding Key Technology R&D Program (Grant No. 2016C02054-2), National Natural Science Foundation of China (Grant No. 31872976), and National Natural Science Foundation of China (Grant No. U1402266).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00274/full#supplementary-material


carcass composition and meat quality in a porcine four-way cross. Anim. Genet. 37, 543–553. doi: 10.1111/j.1365-2052.2006.01523.x


fgene-10-00274 March 26, 2019 Time: 16:12 # 10


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Xu, Sun, Zhang, Zhao, Olasege, Li, Yue, Ma, Zhang, Wang and Pan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

fgene-10-00274 March 26, 2019 Time: 16:12 # 11

# The Genetics of Life and Death: Virus-Host Interactions Underpinning Resistance to African Swine Fever, a Viral Hemorrhagic Disease

Christopher L. Netherton<sup>1</sup> \*, Samuel Connell<sup>1</sup> , Camilla T. O. Benfield<sup>2</sup> and Linda K. Dixon<sup>1</sup> \*

<sup>1</sup> The Pirbright Institute, Woking, United Kingdom, <sup>2</sup> Royal Veterinary College, London, United Kingdom

#### Edited by:

Andrea B. Doeschl-Wilson, Roslin Institute, University of Edinburgh, United Kingdom

#### Reviewed by:

Filippo Biscarini, Italian National Research Council (CNR), Italy Christine Tait-Burkard, University of Edinburgh, United Kingdom

#### \*Correspondence:

Christopher L. Netherton chris.netherton@pirbright.ac.uk Linda K. Dixon linda.dixon@pirbright.ac.uk

#### Specialty section:

This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics

Received: 13 July 2018 Accepted: 12 April 2019 Published: 03 May 2019

#### Citation:

Netherton CL, Connell S, Benfield CTO and Dixon LK (2019) The Genetics of Life and Death: Virus-Host Interactions Underpinning Resistance to African Swine Fever, a Viral Hemorrhagic Disease. Front. Genet. 10:402. doi: 10.3389/fgene.2019.00402 Pathogen transmission from wildlife hosts to genetically distinct species is a major driver of disease emergence. African swine fever virus (ASFV) persists in sub-Saharan Africa through a sylvatic cycle between warthogs and soft ticks that infest their burrows. The virus does not cause disease in these animals, however transmission of the virus to domestic pigs or wild boar causes a hemorrhagic fever that is invariably fatal. ASFV transmits readily between domestic pigs and causes economic hardship in areas where it is endemic. The virus is also a significant transboundary pathogen that has become established in Eastern Europe, and has recently appeared in China increasing the risk of an introduction of the disease to other pig producing centers. Although a DNA genome mitigates against rapid adaptation of the virus to new hosts, extended epidemics of African swine fever (ASF) can lead to the emergence of viruses with reduced virulence. Attenuation in the field leads to large deletions of genetic material encoding genes involved in modulating host immune responses. Therefore resistance to disease and tolerance of ASFV replication can be dependent on both virus and host factors. Here we describe the different virus-host interfaces and discuss progress toward understanding the genetic determinants of disease outcome after infection with ASFV.

Keywords: African swine fever virus (ASFV), interferon, warthog, Ornithodoros, host resistance, host tolerance, viral hemorrhagic fever, DNA virus infection

## INTRODUCTION

African swine fever virus (ASFV) is present in a stable equilibrium with its wildlife hosts, warthogs and soft ticks of Ornithodoros spp., in a unique ecological niche in Eastern and Southern Africa. In these hosts virus can persist over an extended time without causing disease. However, infection of domestic pigs or wild boar with ASFV results in an invariably fatal disease, African swine fever (ASF), which is readily spread between infected pigs or wild boar without the requirement of a tick vector (**Figure 1**). ASFV can also infect and replicate in bushpigs, but like the warthog these animals

**170**

**Abbreviations:** ASF, African swine fever; ASFV, African swine fever virus; IFN, interferon; ISG, interferon stimulated gene; MGF, multigene family.

do not exhibit clinical signs of disease. Understanding the virus interactions with, and evolution within, these different hosts will help establish the basis for the dramatically varying pathogenesis and potentially unravel the basis for disease resistance of the wild suids in Africa.

ASF was first recognized in the early twentieth century in Kenya as an acute hemorrhagic fever that caused death of most infected domestic pigs (Montgomery, 1921). Early experiments established that warthogs did not show clinical signs of disease but provided a reservoir of infection. ASF was recognized in many Eastern and Southern African countries soon after the initial description and spread further through central and West Africa (Jori et al., 2013; Penrith et al., 2013). From Africa, ASFV expanded into Portugal in 1957 and 1960 and became endemic in the Iberian Peninsula until it was eradicated in the mid-1990s. During this time the disease also became established in Sardinia and sporadic outbreaks were reported in Western Europe, Brazil and the Caribbean. From 1999, with the exception of Sardinia, no further outbreaks of ASF were reported outside of Africa until its appearance in Georgia in the Caucasus region in 2007. Currently ASF is present in sub-Saharan Africa, Sardinia, the Trans Caucasus, the Russian Federation, and Central and Eastern states of the European Union. ASFV continues to spread, with first reports of the disease in China (August 2018), Bulgaria (August 2018), Belgium (September 2018), and Vietnam (February 2019) highlighting the increasing threat of ASF to the global pig industry.

Here we discuss ASFV infection of domestic pigs, wild boar and other wildlife hosts, summarizing current knowledge of how host and viral genetics contribute to pathogenesis and the different disease outcomes seen in different hosts. We also discuss prospects of how these differences might be leveraged to inform breeding or genetic engineering strategies to improve disease resistance in the domestic pig population.

## ASFV GENETICS

## ASFV Genetic Variability

African swine fever virus is a large double-stranded DNA virus, which replicates predominantly in the cell cytoplasm and shares a similar replication cycle and genome structure with the poxviruses. However the icosahedral virus morphology differs from the poxviruses and genome sequencing established that ASFV is the only member of a unique virus family, the Asfarviridae. Genome sequencing also showed a distant relationship between ASFV and some giant viruses that infect lower eukaryotes, including the Faustovirus, Pacmanvirus and

Kamoebaviruses (Reteno et al., 2015; Bajrai et al., 2016; Andreani et al., 2017). Thus these diverse viruses may have shared a common ancestor. ASFV's genome varies between 170 and 190 kb in length. These gross differences in genome size are predominately due to differences in the copy number of five different multigene families (MGF); for example the copy number of MGF 360 can vary between 11 and 18 in field isolates (Chapman et al., 2008). Promotion of homologous recombination or unequal crossover during DNA replication within infected cells (Rodríguez et al., 1992) is a likely driver of the loss and exchange of genetic material that has been observed in isolates from both ticks and domestic pigs (Dixon and Wilkinson, 1988; Chapman et al., 2011). Interestingly the rapid amplification of individual genes by gene duplication under selection pressure has been observed in poxviruses (Elde et al., 2012) and a similar mechanism may have contributed to differences in the copy number of individual MGFs in ASFV. Paralogs of the MGF genes can be very divergent in sequence indicating evolution over an extended period. This may be related to selection pressure exerted such as altered host tropism. Gene families have evolved in other viruses, for example vaccinia virus encodes a highly divergent family of proteins containing a Bcl-2 protein fold which have different roles in evasion of innate immune responses, including apoptosis and signaling pathways (Graham et al., 2008; Kvansakul et al., 2008; Neidel et al., 2015). Errors in unit genome resolution during the head-to-head DNA replication can also result in sequence transpositions from one genome end to the other as seen in a recent ASFV isolate from northern Estonia that had 14 kb deleted from the left end of the genome and replaced with 7 kb from the right (Zani et al., 2018).

ASFV isolates have been divided into genotypes based on partial sequencing of the B646L gene which encodes the ASFV major capsid protein. This has defined 24 different genotypes to date (**Figure 2**; Bastos et al., 2003; Quembo et al., 2018) which fall into three main lineages (Boshoff et al., 2007). However, a limitation of this approach is that the number of nucleotide differences between closely related genotypes can be low. Nevertheless phylogenetic trees constructed from short stretches of conserved genes such as p72 do broadly fit with those generated from concatenated conserved nucleotide or protein sequences (de Villiers et al., 2010). Currently 30 complete genome sequences are available but more than two-thirds of these are of three related genotypes and are therefore clearly not representative of the 24 described genotypes based on p72 sequencing, limiting opportunities to infer evolutionary relationships. Up to 18 genes under positive selection for diversification have been identified by comparing rates of synonymous to non-synonymous substitutions at individual amino acids (de Villiers et al., 2010). These included members of MGF 360 and 505 families, genes involved in modulating host cell functions, several enzymes, the CD2-like and C-type lectin genes and the virus capsid protein chaperone B602L. Drivers for diversification might include immune or host genetic pressure. The major capsid protein did not have any sites under strong selection indicating strong stabilizing selection (de Villiers et al., 2010).

## ASFV Modulation of the Host Response in the Domestic Pig

#### Inhibitors of Type I Interferon

ASFV encodes a number of proteins that inhibit innate immune responses including type I interferon (IFN), the main antiviral response. Stimulation of cellular pattern recognition receptors by an array of pathogen associated molecular patterns induces signaling pathways leading to transcription of type I IFN (Bowie and Unterholzner, 2008; Schoggins et al., 2011; Thompson et al., 2011). The secreted type I IFN activates signaling in infected and bystander cells leading to transcription of over 300 interferon stimulated genes. These include proteins that induce an antiviral state, via blocking the viral replication cycle or activating components of protective innate and adaptive immune responses (Schoggins et al., 2011). For example Mx proteins sequester viral replication factors preventing efficient replication (Netherton et al., 2009) and IFITM proteins restrict virus entry by inhibiting membrane fusion (Benfield et al., 2015).

Although their functional role is currently poorly understood, and they have no obvious similarity to other genes or proteins, there is mounting evidence to suggest that MGF genes may play a role in both host range and subversion of the innate immune system. Sequence analysis indicated that the low virulence isolate OUR T88/3 lacks eight MGF genes (MGF360- 10L, 11L, 12L, 13L, 14L, MGF505-1R, 2R, 3R), which are otherwise present in virulent ASFV isolates, suggesting they may play a role in virulence (Chapman et al., 2008; Dixon et al., 2013). Furthermore, levels of IFN in the bloodstream apex prior to the viremic peak, indicating the ability of virulent viruses to endure the host IFN response (Karalyan et al., 2012; Golding et al., 2016). Indeed IFN priming of primary macrophages limited replication of attenuated OURT88/3 but not virulent isolates (Golding et al., 2016).

ASFV lacking these specific MGF genes, including genetically modified virus with the genes in question deleted, induce a stronger innate immune response. Deletion of five MGF360 and three MGF505 from highly virulent Benin 97/1 resulted in attenuation, increased IFNβ production in vitro and significantly enhanced protection in vivo against challenge with parental virus (Reis et al., 2016). The presence of genes from the MGF360 and MGF505 cluster are directly responsible for supressing IFN responses in vitro in cells infected with virulent Pr4 (Afonso et al., 2004) and overcoming IFN mediated inhibition of virus replication (Golding et al., 2016). Further experiments were not able to directly attribute this function to a sole gene or MGF family. A subset of these genes are also important for host range in Ornithodoros ticks (Burrage et al., 2004), however the mode of action in the arthropod vector is unknown. Other ASFV genes shown to inhibit type I IFN responses include I329L, an agonist of Toll-like receptor 3 signaling (de Oliveira et al., 2011).

#### Inhibitors of Apoptosis

Induction of apoptosis can limit virus replication and many viruses, including ASFV, encode apoptosis inhibitors (Dixon et al., 2017). These include a Bcl-2 family member A179L, inhibitor of apoptosis member A224L and a C-type lectin protein

FIGURE 2 | Neighbor-Joining phylogenetic tree of representative ASFV isolates. The evolutionary history was inferred using the Neighbor–Joining method. The optimal tree with the sum of branch length = 0.29203136 is shown. The bootstrap test values (i.e., percentage of replicate trees in which the associated taxa clustered together, 1000 replicates) are shown next to the nodes. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Kimura 2-parameter method and are in the units of the number of base substitutions per site. The analysis involved 47 nucleotide sequences. All positions containing gaps and missing data were eliminated. There were a total of 399 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 (Kumar et al., 2016). Symbols indicate isolates shown in Figure 3, all other isolates were obtained from domestic pigs. L1, L2, and L3 indicate the lineages identified by Boshoff et al. (2007). Full details of these isolates are provided in Supplementary Table S1.

EP153R (Hurtado et al., 2004). Other ASFV proteins inhibit stress-activated apoptosis (Zhang et al., 2010). The A179L protein has an unusually broad specificity of binding to pro-apoptotic Bcl-2 family BH3 domain-containing proteins (Banjara et al., 2017). This may allow for functionality in both mammalian and Ornithodoros hosts.

#### Adhesion Proteins

The ASFV CD2-like protein causes binding of red blood cells to extracellular virions and infected cells. This protein has roles in virus dissemination and persistence in blood in the mammalian host (Borca et al., 1998) as well as facilitating virus uptake into the tick vector (Rowlands et al., 2009). Both of these functions may provide an advantage for virus replication in the tickwarthog cycle.

#### INFECTION AND PATHOGENESIS IN DIFFERENT HOST SPECIES

#### ASFV in Domestic Pigs Transmission of ASFV to Domestic Pigs and Wild Boar

The emergence of ASFV from its ancient sylvatic ecology in Eastern and Southern Africa, involving warthogs and soft tick vectors of the Ornithodoros spp., into domestic pigs and wild boar has resulted in a dramatic change in the pathogenesis of the virus and the mechanisms by which transmission occurs. Transmission by the tick vector is not required in the domestic pig or wild boar cycle and direct transmission between pigs occurs readily in the absence of the tick vector (epidemiological cycles 3 and 4 in **Figure 1**; Wilkinson, 1984; Guinat et al., 2016). Indeed the ancient sylvatic cycle involving warthogs and ticks has only been described in parts of Eastern and Southern Africa (Jori et al., 2013) meaning that spread through other susceptible populations is unlikely to have placed the same constraints on virus replication.

ASF first spread outside Africa, to Portugal and Spain and from there to a number of other European countries, as well as Brazil and the Caribbean. The disease persisted in the Iberian Peninsula for over three decades, but was eradicated from all of these countries except Sardinia by the mid-1990s. In the Iberian Peninsula ASFV circulated in pigs, wild boar and Ornithodoros erraticus, whereas soft ticks did not play a role in Sardinia (Sánchez-Vizcaíno et al., 2015). However, wild boar were not thought to play a significant role in maintaining the virus. In the present epidemic in Russia and Eastern Europe, wild boar have played an important role in spread of disease and maintaining a wildlife reservoir (Abrahantes et al., 2017; Chenais et al., 2018) and there is no evidence of a role for soft ticks. Wild boar show similar clinical signs to domestic pigs and case fatality rates are also close to 100% following infection with highly virulent isolates (Gabriel et al., 2011). Studying the evolution of ASFV clinical forms and associated viral genetic changes during the current epidemic in Europe provides an excellent opportunity to follow the virus adaptation to different hosts.

#### Pathogenesis of ASFV in Domestic Pigs and Wild Boar

Early descriptions of ASF disease in domestic pigs were of an acute hemorrhagic fever causing death of close to 100% of infected pigs (Montgomery, 1921). This is still the predominant disease form reported in both Africa and in Europe (Tauscher et al., 2015). However different disease courses in pigs have been associated with isolates which vary in virulence. Moderately virulent isolates result in death of a lower percentage of animals and a subacute form of the disease. Low virulence isolates may cause few if any deaths and a chronic form of disease characterized by the absence of vascular lesions but signs such as delayed growth, emaciation, joint swelling, skin ulcers and lesions associated with secondary bacterial infection. Moderately virulent and low virulence isolates were described after the introduction of the virus into Spain and Portugal, and similar isolates have now been described from different countries in Africa (Souto et al., 2016) and also in Eastern Europe (Gallardo et al., 2018; Zani et al., 2018). Detection of ASFV specific antibodies in serum from wild boar in Eastern Europe may indicate reduced virulence of circulating isolates, since in acute infections animals die before an antibody response is detected. As yet limited full genome sequences are available for ASFV but reduction in virulence has been associated with genome changes including large deletions and sequence transpositions from one genome end to the other (Zani et al., 2018). Recovered animals may remain persistently infected over extended time periods of weeks or months. Shedding of virus and transmission from recovered animals to in contact animals has been described but it remains unclear whether these carrier animals play an important role in virus spread (Boinas et al., 2004; de Carvalho Ferreira et al., 2012; Gallardo et al., 2015; Petrov et al., 2018). Interestingly ASFV has persisted in Sardinia for 40 years in a pig-pig-wild boar transmission cycle without loss of virulence and with few genetic changes (Granberg et al., 2016; Sanna et al., 2017). Therefore, the main mechanisms of ASFV persistence and transmission in different epidemiological scenarios clearly influence which types of ASFV isolates emerge and become predominant.

#### Influence of Host Genetics on the Outcome of Disease in Pigs

Most reports of ASF disease in domestic pigs or wild boar describe similar acute disease forms with high case fatality in all ages and breeds following infection with highly virulent ASFV isolates (Gabriel et al., 2011; Blome et al., 2013; Nurmoja et al., 2017). However there are also reports indicating differences in susceptibility to disease in some populations or ages of domestic pigs. In one study the percentage of older pigs surviving infection with a moderately virulent isolate was shown to be higher than for younger pigs (Post et al., 2017). In Mozambique, although some pigs were identified that survived infection with a virulent isolate, this apparent resistance was found not to be transmitted to offspring based on results of viral challenge experiments (Penrith et al., 2004). In some regions of Africa apparently healthy pigs have tested positive for virus or had ASFV specific antibodies without showing clinical signs of the disease (Uttenthal et al., 2013; Thomas et al., 2016; Abworo et al., 2017; Kipanyula and Nong'ona, 2017). SNP analysis was used to assess the genetic diversity of two populations of Kenyan pigs and compare them to bushpigs, warthogs, European wild boar as well as four breeds of commercial pigs. Principal component and admixture analyses identified six separate groups, with the two populations of Kenyan pigs forming two distinct groups alongside groups comprised of wild boar, Duroc pigs, African suids or the three other domestic pig breeds (Large white cross, Yorkshire, and Landrace). The failure to resolve bushpigs and warthogs as separate populations was likely due to few markers in the porcine SNP array being amplified in samples from these animals. The Homabay population from Kenya had a local indigenous composition distinct from commercial breeds. In contrast, pigs from Busia and the surrounding area were a nonhomogenous admixed population with significant introgression of genes from commercial breeds. Notably a higher percentage of pigs that tested negative for ASFV by PCR had significantly higher local ancestry. Although serology was not performed to prove previous ASFV infection, the study provides some evidence that local ancestry confers a survival advantage against ASFV and a basis to explore genetic determinants underlying resistance to developing disease (Mujibi et al., 2018).

#### ASFV in Other Suid Species

The link between outbreaks of ASF and a wildlife reservoir was suspected during the emergence of the disease in the early twentieth century (Montgomery, 1921). Subsequent studies confirmed the isolation of infectious virus from apparently healthy warthogs associated with outbreaks of disease in domestic pigs in both Kenya and South Africa (De Kock et al., 1940; Hammond and Detray, 1955). Infectious virus has been recovered from bushpigs (Potamochoerus spp.), warthogs (Phacochoerus spp.), Ornithodoros ticks and a single giant forest hog (Hylochoerus meinertzhageni). The expansion of ASF into South-East Asia raises the possibility of transmission of the virus to other species and genera of suids which have not previously encountered the disease. Warty pigs and bearded pigs (all species of Sus) indigenous to Indonesia and the Philippines would be predicted to suffer similar disease outcomes to domestic pigs and wild boar. However, pygmy hogs (Porcula salvania) found in India and babirusa (Babyrousa ssp.) from Indonesia are distinct genera (Funk et al., 2007) and their susceptibility to ASFV is unclear; although classical swine fever virus, an RNA virus that causes a disease with similar clinical signs to ASFV, can infect and kill pygmy hogs (Barman et al., 2012). The wild populations of many of these species are of concern with pygmy hogs and Visayan warty pigs (Sus cebifrons) considered critically endangered according to the International Union for Conservation of Nature (Narayan et al., 2008; Meijaard et al., 2017). Spill over of ASF into these wild suids could lead to other avenues for exploring disease resistance, but could add an unwelcome pressure on already threatened populations.

#### ASFV in Potamochoerus spp.

Bushpigs (Potamochoerus larvatus) are distributed throughout Eastern and Southern Africa while red river hogs (Potamochoerus porcus) are found in sub-Saharan West and Central Africa. ASFV has been isolated from both bushpigs and red river hogs (**Figure 3**) and as the two species are closely related and can interbreed we will use bushpigs to refer to all Potamochoerus spp. ASFV infection does not induce clinical signs of disease in bushpigs, with virus titres in the blood and tissues 100 fold lower than the 8–9 logs typically seen in domestic pigs (Anderson et al., 1998; Oura et al., 1998a). Virus replication in tissues is also reduced and although extensive B-cell apoptosis in lymph nodes has been observed, this is not as extensive as seen in domestic pigs and other structures are essentially unaffected. Experimentally infected bushpigs clear ASFV from the tissues (Detray, 1963; Anderson et al., 1998) and gain immunity to subsequent rechallenge with homologous virus strains. Bushpigs can transmit virus to feeding ticks and to in-contact pigs. Transmission to pigs depends on the frequency of contacts with domestic pigs and may also be strain specific (Anderson et al., 1998). The role of bushpigs in maintaining a reservoir of virus is unclear since they do not reside in burrows like warthogs and hence are not thought to come into frequent contact with Ornithodoros ticks.

#### ASFV in Phacochoerus spp.

There are two species of warthog in Africa, the common warthog (Phacochoerus africanus) which is distributed throughout sub-Saharan Africa and the desert warthog (Phacochoerus aethiopicus) which is restricted to the Horn of Africa and Northern Kenya. We will use the term warthog to refer to the common warthog as to our knowledge ASFV has not been isolated from Phacochoerus spp. within the known distribution of the desert warthog, although the recent outbreaks in the Tigray region in northern Ethiopia (Achenbach et al., 2017) suggest that it may only be a matter of time before this occurs. ASFV has been isolated from warthogs across Southern and Eastern Africa (**Figure 1**), and seropositive animals have also been found in Botswana and Zimbabwe (Jori et al., 2013).

Serological surveys suggest that infection rates in populations of warthogs where ASFV is endemic are typically greater than 80% (Plowright, 1977; Thomson, 1985), although viremia in wild

hogs (Potamochoerus spp.), common and desert warthog (Phacochoerus spp.) and the giant forest hog. ASFV isolates for which the genotype has been determined are indicated by colored symbols. ASFV isolates from soft ticks (Ornithodoros moubata complex) are also indicated. Tick isolates were collected from warthog burrows, with the exception of the two genotype VIII isolates from Malawi and isolates of genotype II and XXIV from Mozambique which were collected from pig holdings. Each symbol indicates a single location which may represent up to 11 separate isolates, full details of these are provided in Supplementary Table S2. The positions of some symbols have been moved to aid clarity where multiple genotypes or hosts have been identified at the same sites.

adult warthogs is rare, with infectious virus mostly restricted to lymph nodes. However, wild caught neonatal animals from the Serengeti do exhibit detectable viremia (Plowright, 1977; Thomson, 1985) and experimental infection of naïve young warthogs also yields low viremia for several weeks which may be sufficient to infect ticks (Thomson et al., 1980; Anderson et al., 1998). Therefore it is likely that the warthog-tick sylvatic cycle is in part maintained by ticks transmitting the disease to 3–4 weeks old warthogs that can then transmit the virus to naïve ticks. Interestingly the proportion of ASFV positive ticks in warthog burrows in Western Uganda were found to be very low and the majority of warthogs in this area did not become seropositive until they were 6 months old (Plowright, 1977). In combination with the observation that warthogs in the central Kenyan highlands were seropositive in the absence of ticks, this suggests there a number of different sylvatic cycles capable of maintaining a virus reservoir. Infectious virus persisted in warthog tissues up to 25 weeks post-infection, but is cleared by 56 weeks (Anderson et al., 1998). Field observations have demonstrated persistent infection of warthog tissues (Plowright et al., 1969b; Plowright, 1977). This could be explained by repeated re-infection of warthogs by ticks with the same virus strain. Warthogs probably develop an adaptive immune response to a given ASFV strain, which while insufficient to prevent replication at primary sites of infection can prevent an acute phase and hence virus dissemination into the blood stream.

#### ASFV in Ornithodoros spp.

African swine fever has been isolated from Ornithodoros spp. ticks collected from warthog burrows from Kenya to South Africa

(see **Figure 1**), although the proportion of ticks positive for virus is typically less than 1%. Virus is transmitted sexually and transtadially in ticks (Plowright et al., 1970) and can be isolated from all developmental stages (Plowright et al., 1969a; Quembo et al., 2018). Transovarial transmission of the virus has also been shown in ticks from the O. moubata complex. Detailed genetic and morphological analyses of Afrotropical Ornithodoros spp. have identified at least four species within each of the O. savignyi and O. moubata groups, only one of which is not thought to be associated with pigs or warthogs (Bakkes et al., 2018). However, O. phacochoerus, O. porcinus, and O. waterbergensis are the principal species linked to the sylvatic cycle. Although O. moubata spp. are true biological vectors of ASFV virus replication can be deleterious to the tick (Hess et al., 1989) and experimental infection of Ornithodoros spp. from the Americas also causes tick mortality (Hess et al., 1987). Therefore, the relative ability of different Afrotropical Ornithodoros species to support the replication of different strains of ASFV may be an important aspect of the sylvatic cycle. Genetically related, but distinct strains of ASFV have been identified in ticks from separate warthog burrows within close proximity to each other (Dixon and Wilkinson, 1988; Wilkinson et al., 1988), demonstrating divergent evolution of ASFV within the sylvatic cycle. The sylvatic cycle in Africa provides a reservoir of persistently infected hosts to maintain the infection. In the current situation in Europe and China wild boar and domestic pigs in most cases develop disease with high levels of case fatality. Thus maintaining a virus reservoir requires a readily available pool of susceptible hosts or an effective indirect transmission route.

As yet few virus genes have been identified which confer an advantage for replication in the tick vector (Burrage et al., 2004; Rowlands et al., 2009). A functional genomics approach, involving targeted gene deletions and modifications and testing the effect of these on virus replication in the tick would provide further insights. The lack of a tick cell line susceptible to ASFV infection is a constraint meaning that infections of live ticks is required to achieve this. Further comparative full genome sequencing of virus isolates from tick/warthog and domesticpig/wild-boar cycles would also help to unravel virus adaptations and selections required for replication in the tick.

#### POTENTIAL MECHANISMS FOR HOST RESISTANCE

Due to the paucity of experimental and genetic data available it is difficult to draw conclusions about why warthogs and bushpigs exhibit limited clinical signs after infection with ASFV when compared to domestic pigs and wild boar. Viral replication is approximately 100-fold lower in bushpigs than in domestic pigs, and replication in warthogs 10-fold less than bushpigs. Comparison of in vitro growth curves in macrophages suggest there is no intrinsic difference in the ability of target cells to support the growth of ASFV between the three species (Anderson et al., 1998). It is therefore likely that the innate immune response plays a key role in controlling the levels of virus replication and pathogenesis in different infected hosts. Thus in hosts which do not develop disease the innate immune response may better control virus replication and avoid a pathogenic response. This may involve both viral and host factors. For example virus genetic factors may be less effective in controlling innate responses in the wild African suids compared to the domestic pig or wild boar. Alternatively host genetic factors may reduce over-activation of potentially harmful responses and hence reduced clinical signs may also be due to host tolerance.

ASFV encodes for a diverse combination of genes capable of supressing the induction of type I IFN in domestic pigs. It is tempting to speculate that this functional redundancy of viral IFN inhibitory factors evolved to combat the effect of IFN in the natural host. It would therefore be interesting to compare type I IFN induction and responses in wild African suids compared to domestic pigs and wild boar. Human IFN stimulated genes Mx1 and IFITM (Netherton et al., 2009; Munoz-Moreno et al., 2016) inhibit ASFV replication in vitro, however the effect of the suid homologs are unknown. Work in our laboratories is currently ongoing to determine the genetic and functional differences between the pig and warthog homologs of these genes.

NK cells are capable of killing virus infected cells, secreting immunomodulatory cytokines and activating dendritic cells, linking with the adaptive immune response. Subclinical infections of domestic pigs with low virulent strains of ASFV and protection in subsequent challenge studies are linked to enhanced NK cell activity (Leitao et al., 2001), so differences in the way these cells respond to ASFV could play a role in the ability of bushpigs and warthogs to control infection.

Interspecies differences in the pathology of ASFV could also be linked to differences in host response to infection. Like many hemorrhagic diseases the pathology of ASFV in domestic pigs has been linked to the overexpression of cytokines such as IFN and tumor necrosis factor alpha (Oura et al., 1998b; Gómez del Moral et al., 1999; Golding et al., 2016). The NF-κB transcription factor controls transcription of both these cytokines and ASFV encodes proteins that can inhibit this pathway (Granja et al., 2006), but these viral NF-κB inhibitors could be less effective in the warthogs and bushpigs compared to domestic pigs or wild boar. Alternatively host transcription factors may be less active in warthogs and bushpigs. For example reporter assays in monkey kidney cells and mouse embryonic fibroblasts show that the RELA subunit of NF-κB from the domestic pig has lower activity after induction by external stimuli than the warthog homolog, but has higher basal activity (Palgrave et al., 2011) and that this difference appears to be due a S531P variant present in the warthog. Genome sequences will help develop additional avenues of research to understand the mechanisms responsible for differences in disease outcomes between domestic pigs, bushpigs and warthogs.

#### CONCLUSION AND FUTURE PERSPECTIVES

The mechanisms which result in reduced viral replication and lack of disease in African wild suids after ASFV infection are

largely unknown. The data so far indicate that this is not due to an intrinsic difference in the ability of the virus to replicate in macrophages from these hosts. A more likely explanation is that the innate immune system of these hosts is better able to control virus replication resulting in a reduced systemic infection and reduced pathogenesis. This may involve a balance between virus and host factors which has evolved over long term infections of these hosts. Sequence information from African wild suids will enable further investigation of the interaction of ASFV with components of the innate immune system compared to domestic pigs and wild boar. A better understanding of ASFV mechanisms of evading host defenses will contribute to this. Of special interest are the functions of the many members of five MGF encoded by ASFV. As is the case in other viruses these may have evolved in the virus genome to modulate the host's innate immune response.

Genetic modification has been used to generate pigs resistant to porcine respiratory and reproductive syndrome virus (Whitworth et al., 2016; Burkard et al., 2018) or classical swine fever virus (Xie et al., 2018) and therefore could be a viable route to increase resistance to ASFV. Identified warthog or bushpig sequences could be engineered into the pig genome to generate animals in which replication and/or disease burden after ASFV infection is reduced. However, in order to generate a pig that is fully resistant to ASFV infection, as has been accomplished with porcine respiratory and reproductive syndrome virus, a more effective strategy may be to target essential elements of the viral replication cycle such as entry.

Different clinical courses of ASFV infection in pigs have been described, apparently largely due to the virulence of the virus isolates, and sequencing the genomes of isolates of reduced virulence have identified virus genes associated with this phenotype. Targeted gene modifications and deletions and testing of the genetically modified viruses in macrophages and in pigs have contributed to understanding of virulence factors and how the virus modulates host responses. There are no licensed ASFV vaccines available and further research in this area will also contribute to the development of live attenuated vaccines for ASFV.

The issue of whether outcome of ASFV infection in pigs also depends on host genetics has been discussed and considered over a number of years without definite conclusions. Recent studies linking genetics of different pig breeds in Kenya with prevalence

#### REFERENCES


of ASFV infection is a promising step forward. Further study of these pigs to confirm that resistance to developing disease after ASFV infection is due to genetic differences rather than hitherto unknown environmental factors could open the possibility of breeding in resistance to the disease. Analysis of other African pig breeds with suspected disease resistance to ASFV may identify additional factors that could be incorporated into such a strategy. Viable bushpig-domestic pig hybrids have been observed in the field and these could open up another avenue of research if these animals were also resistant to disease.

In the longer term a better understanding of ASFV interactions with its different hosts will be not only of great scientific interest but will lead to improved control strategies for this disease and help prevent global spread.

#### AUTHOR CONTRIBUTIONS

CN, SC, CB, and LD wrote individual sections of the manuscript. All authors contributed to the editing and revision of the manuscript as well as read and approved the submitted version.

#### FUNDING

CN and LD were supported by the United Kingdom Biotechnology and Biological Sciences Research Council (BBSRC) through projects BBS/E/I/00007030, BBS/E/I/00007031, BBS/E/I/00007034, BBS/E/I/00007034, BBS/E/I/00007035, BBS/E/I/00007036, BBS/E/I/00007037, BBS/E/I/00007038, and BBS/E/I/00007039, and by the United Kingdom Department for Environment, Food & Rural Affairs through project SE1516. SC was supported by The Oxford Interdisciplinary Bioscience Doctoral Training Partnership funded by the BBSRC grant BB/M011224/1. CB was supported by a Royal Veterinary College Internal Grant (IGS 2930).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00402/full#supplementary-material

virus in domestic pigs from Ethiopia. Transbound. Emerg. Dis. 64, 1393–1404. doi: 10.1111/tbed.12511



Detray, D. E. (1963). African swine fever. Adv. Vet. Sci. 8, 299–333.




**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2019 Netherton, Connell, Benfield and Dixon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Quantitative Trait Loci Mapping for Lameness Associated Phenotypes in Holstein–Friesian Dairy Cattle

*Enrique Sánchez-Molano1†, Veysel Bay2,3†, Robert F. Smith4, Georgios Oikonomou2,4‡ and Georgios Banos1,5\*‡*

*1 The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush, Edinburgh, United Kingdom, 2 Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom, 3 Bandirma Sheep Research Institute, The Ministry of Agriculture and Forestry, Balikesir, Turkey, 4 Institute of Veterinary Science, University of Liverpool, Leahurst Campus, Liverpool, United Kingdom, 5 The Roslin Institute Building, Scotland's Rural College, Easter Bush, Edinburgh, United Kingdom*

#### *Edited by:*

*John Anthony Hammond, Pirbright Institute, United Kingdom*

#### *Reviewed by:*

*Hermann H. Swalve, Martin Luther University of Halle-Wittenberg,Germany Filippo Biscarini, Italian National Research Council (CNR), Italy*

*\*Correspondence:*

*Georgios Banos Georgios.Banos@roslin.ed.ac.uk*

> *†These authors shared first authorship*

*‡These authors have contributed equally to this work*

#### *Specialty section:*

*This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics*

*Received: 10 September 2018 Accepted: 05 September 2019 Published: 04 October 2019*

#### *Citation:*

*Sánchez-Molano E, Bay V, Smith RF, Oikonomou G and Banos G (2019) Quantitative Trait Loci Mapping for Lameness Associated Phenotypes in Holstein–Friesian Dairy Cattle. Front. Genet. 10:926. doi: 10.3389/fgene.2019.00926*

Lameness represents a significant challenge for the dairy cattle industry, resulting in economic losses and reduced animal health and welfare. The existence of underlying genomic variation for lameness associated traits has the potential to improve selection strategies by using genomic markers. Therefore, the aim of this study was to identify genomic regions and potential candidate genes associated with lameness traits. Lameness related lesions and digital cushion thickness were studied using records collected by our research team, farm records, and a combination of both. Genome-wide analyses were performed to identify significant genomic effects, and a combination of single SNP association analysis and regional heritability mapping was used to identify associated genomic regions. Significant genomic effects were identified for several lameness related traits: Two genomic regions were identified on chromosome 3 associated with digital dermatitis and interdigital hyperplasia, one genomic region on chromosome 23 associated with interdigital hyperplasia, and one genomic region on chromosome 2 associated with sole haemorrhage. Candidate genes in those regions are mainly related to immune response and fibroblast proliferation. Quantitative trait loci (QTL) identified in this study could enlighten the understanding of lameness pathogenesis, providing an opportunity to improve health and welfare in dairy cattle with the addition of these regions into selection programs.

#### Keywords: Lameness, GWAS, welfare, regional heritability mapping, QTL

## INTRODUCTION

Lameness is a complex trait defined as an abnormal stance or gait of the animal that results from disorders of the locomotor system. In dairy cattle, lameness is one of the most important health conditions together with impaired fertility and mastitis (Green et al., 2002; Buitenhuis et al., 2007; Cha et al., 2010), and causes important economic losses and reduced animal health and welfare (Huxley, 2013).

Many lameness cases are associated to various infectious and non-infectious diseases (Green et al., 2002; Van Der Waaij et al., 2005; Bicalho and Oikonomou, 2013), resulting in painful foot lesions such as sole ulcers, white line lesions, sole haemorrhages, interdigital hyperplasia, and others. Previous studies have shown that both animal genetic and management factors contribute to the

1 **182** development of these diseases (Olmos et al., 2009; Van Der Linde et al., 2010; Swalve et al., 2014). The existence of genetic variation underlying lameness-associated traits has been previously demonstrated using pedigree data analyses, with heritability estimates ranging from 0.06 to 0.52 (Boettcher et al., 1998; Zwald et al., 2004; Koenig et al., 2005; Laursen et al., 2009; Schopke et al., 2015). Furthermore, susceptibility to certain non-infectious foot lesions is also associated with morphological hoof traits such as the thickness of the digital cushion, a complex, force dissipating, subcutaneous tissue located under the distal phalanx (Bicalho et al., 2009).

While reducing the incidence of lameness is one of the main objectives for the dairy cattle industry, current evaluations are based on observational scores such as claw health status data, lameness and mobility scores, conformational traits, and data collected by automated sensors (Heringstad et al., 2018). All this information is obtained once the animal has started to show symptoms of lameness, thus not being available early in life and also showing relatively low heritabilities (Buitenhuis et al., 2007). Therefore, the identification of genomic regions and genes associated to lameness and lameness associated traits could strongly improve selection strategies by providing genomic information to make early breeding decisions and potentially informing more accurate genomic-based selection programmes.

Few studies have addressed traits associated with lameness using a genomic approach. The largest study (Malchiodi et al., 2018) grouped lameness-associated lesions into two categories, infectious and non-infectious, and used single nucleotide polymorphism (SNP) data to identify several genomic regions with candidate genes linked to immune system, morphogenesis, and cell proliferation. Other studies used microsatellite data (Buitenhuis et al., 2007) to identify lameness-associated regions and SNP data (Scholey et al., 2012; Swalve et al., 2014) to identify regions associated to digital dermatitis and sole hemorrhage. The relatively high number of identified regions together with the complex aetiology of lameness seems to support a potential polygenic architecture with many genes influencing the different biological factors involved. Therefore, it is necessary to study the different types of lesions separately in order to identify particular and common genomic regions that contribute to the main condition phenotype.

The objective of the present study is to perform genome-wide analyses to identify regions and candidate genes, and understand the genetic basis of a wide range of lameness-related traits. This knowledge may inform genetic improvement schemes aiming to reduce prevalence of dairy cattle lameness. Digital cushion thickness (DCT) measurements are studied here for the first time from a genomic perspective.

#### MATERIALS AND METHODS

#### Animals and Phenotypes

Ethical approval for the study was granted by the University of Liverpool Research Ethics Committee. ASPA regulated procedures were conducted under a Home Office License (Reference Number: PPL 70/8330).

The study included a total of 554 Holstein–Friesian cows in lactation 0–8 from three different farms. Data sources were from a combination of previous research projects and routinely kept farm records. The recorded lameness-causing foot lesions were digital dermatitis, sole ulcer, white line disease, sole haemorrhage, and interdigital hyperplasia. Cases were defined following the ICAR Claw Health Atlas definitions (Egger-Danner et al., 2015).

Farm phenotypic records for presence (1) or absence (0) of these lesions were extracted from the farm database for all these animals (single record per animal) from May 2006 to October 2017 using TotalVet software (Sum-IT). In addition, 475 cows were individually monitored for the same lesions by a research team led by an experienced veterinarian during three separate time intervals between December 2014 and October 2017. DCT measurements were taken using ultrasonography between December 2014 and January 2016 (1st research interval), and between October 2016 and August 2017 (2nd research interval). A number of cows were also followed during the period between August 2017 and October 2017, but no DCT measurements were obtained (3rd research interval). Lameness lesions were recorded by the veterinarian for 88 animals between December 2014 and January 2016, and for 337 animals between October 2016 and October 2017; 50 cows had records from both these two different time intervals. DCT measurements and recording of lamenesscausing lesions were performed at three time points around animals' calving: 3–4 weeks before the expected calving, the first week after calving, and approximately 8 weeks after calving. Seventy-nine animals only had lameness lesion records obtained from the farms' records. Research and farm records were analysed both separately and combined. For the latter, animals were considered as affected when at least one of the available records (research or farm) indicated presence of the lesion. All these data are summarised in **Table 1**.

Cows were restrained in a foot trimming crush for the measurement of DCT and the recording of lameness-causing foot lesions. Measurement of DCT was performed using an Easi-Scan ultrasound machine (sonographic B-mode, BCFTM Technology, UK) equipped with a linear probe 5–8 MHz. All


*Source of data collection: 0 = farm records only, 1 = research data collected during 1st research interval (December 2014 and January 2016), 2 = research data collected during 2nd research interval (October 2016 and August 2017), 3 = research data collected during 3rd research interval (August 2017 and October 2017), 4 = research data collected during 1st and 2nd research intervals, 5 = research data collected during 1st and 3rd research intervals.*

measurements of DCT were undertaken at the midline, on the lateral claw of the hind left foot. To measure the DCT, the foot was cleaned and loose horn was removed with a hoof knife. Sole contact with the transducer was made using ultrasound gel (Ultrasound Gel, Henry Schein) and a gel standoff (Flexi gel standoff, BCFTM Technology, UK). After freezing the image on the ultrasound monitor (Easi-Scan Ultrasound Remote Display, BCFTM Technology, UK), measurements were taken to the nearest millimetre. The DCT was measured just cranial to the tuberculum flexorum of the pedal bone at the typical sole ulcer site. The distance from the inner margin of the sole (identified as a thin echogenic line) to the distal edge of the pedal bone (identified as a thick echogenic line) was assessed.

#### DNA Sampling, Extraction, and Genotyping

Blood samples were collected from the tail vein of each cow in vacutainer tubes containing EDTA. Genomic DNA was extracted from buffy coat samples using the QIAamp DNA Blood MiniKit from Qiagen. Extracted DNA samples were quantified using a NanoDrop and stored at -20°C. Initially, 266 cows were genotyped using the Affymetrix Axiome bovine 54K SNP array. The Illumina BovineSNP50 bead chip containing 53,714 SNP was used to genotype the rest of the animals. Genotype data obtained from the Affymetrix array were converted to the Illumina chip format by selecting common SNPs using concordant strand assignment and identified allelic calls before further analyses were conducted.

#### Sample and Genotype Quality Control

Quality control was performed using PLINK (Purcell et al., 2007) in order to assess both sample and marker quality. A minimum genotype call rate of 95% was applied, removing SNPs with low genotyping quality. Further quality control on the markers removed those with low minor allele frequency (MAF < 0.01) and showing strong deviations from Hardy-Weinberg equilibrium (threshold of 1.45E-6 calculated genome-wide by applying a Bonferroni correction to obtain a nominal *P*-value of 0.05). The final genotype call rate after genotype quality control was 98.7%. Additional quality control of samples was performed by removing individuals with poor genotype quality (sample call rate lower than 95%). All these quality control procedures resulted in a final dataset of 549 animals genotyped for 34,658 SNPs with positions assigned according to the UMD 3.1 assembly.

#### Population Structure

Principal component analyses of the genotyped animals were performed using GEMMA (Zhou and Stephens, 2012), replacing missing genotypes with the average genotype. Visual exploration showed a relatively light population structure not explained by any of the descriptive factors (e.g., farm, parity number, lactation, etc.). Therefore, the genomic relationship matrix (GRM) among animals was fitted in all ensuing statistical models of analysis as a random polygenic effect to account for any potential inflation effects caused by population structure.

#### Estimation of Variance Components

Estimates of the variance components for each individual trait were obtained by fitting the following model using REACTA (Cebamanos et al., 2014):

$$\mathbf{y} = \mathbf{W}\alpha + \mathbf{Z}\mathbf{u} + \mathbf{e} \tag{1}$$

where **y** represents the vector of phenotypes, **W** is an incidence matrix, **α** is the vector of associated fixed effects, **Z** is the design matrix for the vector **u** of random polygenic effects [distributed as a multivariate normal distribution MVN(0,Vg**G**) with **G** being the GRM and Vg the genetic variance of the trait], and **ε** represents the vector of residual errors [distributed as MVN(0,Ve**I**) with **I** being the identity matrix and Ve the residual variance]. The significance of the genomic (polygenic) effect (*P* = 0.05) was assessed using the likelihood ratio test statistic to compare a model that fits the effect against the base model that excludes it.

Fixed effects used in model (1) were tested previously using Wald tests in ASReml 4 (Gilmour et al., 2009), fitting a logit model for disease traits and a linear model for DCT records, and following a backward elimination approach. After performing analyses for all traits, concordant models were chosen incorporating as fixed effects: i) for the disease research records: farm (3 levels), parity number at recording (three levels, 1, 2, and ≥3), and research interval (five levels, grouped as 1 = interval 1, 2 = interval 2, 3 = interval 3, 4 = intervals 1 and 2, 5 = intervals 1 and 3); ii) for the farm and combined disease records: farm (as before), lactation number at the end of study (four levels 0, 1, 2, and >3), and interval (as before); and iii) for the DCT records: farm (as before), parity number (as before), and assessor (six levels).

#### Genome-Wide Association Analysis

Individual SNP association analyses were performed in those traits with a significant genomic effect from model (1) using GEMMA (Zhou and Stephens, 2012). The linear mixed model was:

$$\mathbf{y} = \mathbf{W}\alpha + \mathbf{x}\beta + \mathbf{Z}\mathbf{u} + \varepsilon \tag{2}$$

where *x* represents the vector of genotypes (coded as 0/1/2) and β is the regression coefficient of the phenotype on the genotypes; all other effects are as described in model (1). The statistical significance of the regression coefficient was assessed using a Wald test. When determining the significant thresholds, a Bonferroni correction was performed for multiple testing due to the number of markers, but not for multiple traits. This resulted in a genome-wide significant threshold (*P* = 0.05) defined at *P* = 1.44E-6 [-log10(*P*) = 5.84] and a suggestive threshold (one false positive *per* genome scan, *P* = 1) defined at *P* = 2.89E-5 [-log10(*P*) = 4.54].

Despite including the polygenic effect in the model, genotyping errors or other artefacts such as cryptic population structure may inflate test statistics. Therefore, to account for any potential remaining inflation, the ratio of the median of the empirically observed distribution of the test statistic to the expected median (inflation factor λ) was used for correction, following the method described by Amin et al. (2007), which assumes that the inflation is constant across the genome.

#### Regional Heritability Mapping

Under the regional heritability mapping approach, the genome was divided into non-overlapping windows of 20 consecutive SNPs. The following model was used in REACTA (Cebamanos et al., 2014):

$$\mathbf{y} = \mathbf{W}\boldsymbol{\alpha} + \mathbf{X}\mathbf{u}\_{(0)} + \mathbf{Z}\mathbf{u}\_{(-1)} + \mathbf{e} \tag{3}$$

where **X** and **Z** are the corresponding design matrices for the effects **u(i)** of the corresponding region *i* {distributed as MVN[0,Vg(i)**G(i)**], with Vg(i) and **G(i)** being the genomic variance and the GRM corresponding to the SNPs in the *i th* region, respectively} and **u**(-i) of the genome (polygenic effect) excluding the region *i* {distributed as MVN[0,Vg(-i)**G(-i)**] with Vg(-i) and **G(-i)** being the genetic variance and the GRM corresponding to all SNPs other than those on the region *i*, respectively}.

The significance of the region effect was assessed using the likelihood ratio test statistic. A total of 1,733 regions were analysed, leading to a genome-wide significant threshold (*P* = 0.05) defined at *P* = 2.89E-5 with Bonferroni correction for multiple regions [-log10(*P*) = 4.54] and a suggestive threshold (one false positive *per* genome scan) defined at *P* = 5.77E-4 [-log10(*P*) = 3.24]. As with the genome-wide association analyses, a correction by the inflation factor λ was applied to account for any remaining inflation after fitting the polygenic effect in the model.

#### RESULTS

#### Descriptive Statistics

The number of cows enrolled *per* farm and *per* period of data collection is presented in **Table 1**. Incidences of each lamenessinducing foot lesion *per* farm are summarised in **Table 2**. Number of cows *per* farm with no, one, two, or three or more of the recorded lesions are presented in **Table 3**.

#### Population Structure

**Figure 1A** shows the eigenvalues corresponding to the principal component analysis performed on the GRM of the genotyped animals. The first seven principal components accounted for about 10% of the total variance, with the first three components

TABLE 2 | Incidence of digital dermatitis (DD), interdigital hyperplasia (IH), sole haemorrhage (SH), sole ulcer (SU), and white line disease (WLD) *per* farm.


TABLE 3 | Number of cows *per* farm with zero, one, two, and three or more recorded lesions (digital dermatitis, interdigital hyperplasia, sole haemorrhage, sole ulcer, and white line disease)


explaining 2.19%, 1.83%, and 1.50%, respectively. **Figure 1B** shows a light population structure mainly due to the first principal component. No population attributes were available that could explain this. Therefore, a polygenic effect based on GRM was fitted to account for this light population structure.

#### Full Genomic Variance Analysis

**Table 4** shows the variance component estimates in the observed scale for those traits with a significant genomic effect (*P* < 0.05) based on the likelihood ratio test. Heritability estimates for the disease traits range from 0.129 (white line disease in combined records) to 0.516 (interdigital hyperplasia in research records), corresponding to genomic variances in the range of 0.008– 0.067. Heritability for DCT at calving was moderate (0.228), corresponding to a genetic variance of 0.549.

#### Genome-Wide Association Analysis

Two genome-wide significant SNPs (for interdigital hyperplasia and digital dermatitis in farm records) and 19 genome-wide suggestive SNPs were detected in the genome-wide association analyses (**Table 5**). After performing the correction by the inflation factor, all λ estimates ranged from 1.002 to 1.033, thus implying the absence of any significant inflation in the test estimates.

MAFs ranged from 0.020 to 0.479, and most substitution effects were positive (with the exception of white line disease in the combined records), thus implying a positive effect of the minor allele against the disease. However, due to a possible Beavis effect (Xu, 2003), the provided effect sizes may be slightly overestimated.

DCT at calving and sole ulcer in the combined records did not provide any significant or suggestive SNP despite showing a significant genomic effect in the previous analyses.

#### Regional Heritability Mapping and Concordant Regions

A significant region for interdigital hyperplasia was detected based on farm records, and 10 suggestive regions were detected for other

TABLE 4 | Estimates of heritability and variance components for traits with a significant (*P* < 0.05) genomic effect.

each of the principal components. (B) shows the population structure according to the first and second principal components.


*Genomic heritabilities (h2) and genomic (Vg) and residual variances (Ve) estimated together with their standard errors. P-values (P) for the significance of the genomic effect and the number of total records (N). Digital cushion thickness at calving (DCT\_fresh), digital dermatitis (DD), interdigital hyperplasia (IH), sole haemorrhage (SH), sole ulcer (SU), and white line disease (WLD).*

traits using the regional heritability mapping approach (**Table 6**). The significant region detected on chromosome 3 for digital dermatitis using research records was also detected as suggestive for interdigital hyperplasia using the combined records.

Four of the detected regions were concordant with suggestive/ significant SNPs detected in the genome-wide association analyses (**Table 5** and **Supplementary Figure 1**), two of them detected from the farm records and another two from the combined farm and research records. In the farm records, the concordant regions detected both by genomic analyses explained 32.90% and 43.90% of the total genomic variance of digital dermatitis and interdigital hyperplasia, respectively. In the combined records, the concordant regions explained 10.53% and 11.29% of the total genomic variance of interdigital hyperplasia and sole haemorrhage, respectively. Again, caution must be exercised while assessing these findings because of possible overestimation due to a potential Beavis effect (Xu, 2003).

#### DISCUSSION

In the present study, two genome-wide association approaches were used to identify quantitative trait loci (QTL) affecting lameness related traits. Comparison of the results provided by individual SNP genome-wide association analyses and regional heritability mapping was performed to strengthen the evidence of

#### TABLE 5 | Significant SNP from the genome-wide association analyses.


*Chromosome (BTA) and base pair position follow the UMD 3.1 assembly. Beta coefficient (minor allele substitution effect) and standard error, minor allele frequency (MAF), and P-value (P) for the beta coefficient. Digital dermatitis (DD), interdigital hyperplasia (IH), sole haemorrhage (SH), sole ulcer (SU), and white line disease (WLD).*



*Chromosome (BTA) and base pair position follow the UMD 3.1 assembly. P-value (P) for the region effect. Digital dermatitis (DD), interdigital hyperplasia (IH), sole haemorrhage (SH), sole ulcer (SU), and white line disease (WLD).*

the identified regions. QTLs were detected for digital dermatitis, interdigital hyperplasia, and sole haemorrhage.

Three sources of data were used in this study in order to identify genomic regions for lameness related diseases. Although using only research-confirmed records is expected to provide more accurate phenotypes than using farm records, the number of observations available was smaller, thus reducing the power to detect significant genomic effects. Similarly, using farm records provided a larger number of records spanning animals' whole lifetime; these records however are potentially less accurate, thus leading to a low detection power and increase the chances of introducing misclassification bias. Impact of phenotypic errors in genomic analyses has been

discussed before both in the context of human (Buyske et al., 2009) and animal data (Biffani et al., 2017). Furthermore, it has been shown previously that farm records could seriously under-record certain lesions (Heringstad et al., 2018) and this has also been the case with our dataset. The most powerful dataset available in the present study was the combination of farm and research records, which provided a larger number of records than research alone but more accurate compared to farm, thus leading to significant estimates of genomic effects for more traits.

Heritability estimates for some of these traits have been previously estimated using pedigree data (Koenig et al., 2005; Van Der Waaij et al., 2005; Gernand et al., 2012; Oberbauer et al., 2013; Van Der Spek et al., 2013; Malchiodi et al., 2017). Such estimates range from 0.07 to 0.4 for digital dermatitis, 0.10 to 0.39 for interdigital hyperplasia, and 0.04 to 0.17 for sole haemorrhage. Our heritability estimates were generally in concordance within these ranges, particularly considering the estimates obtained using the combined records. It has to be recognised that heritability estimates are presented in the observed scale (0–1) and, therefore, population parameters are dependent on the disease prevalence (Lee et al., 2011). However, it is expected that estimates will not vary widely when transformed to the liability scale (normal distribution), where the mean and variance are independent from the prevalence of the disease. In the case of digital dermatitis, the heritability observed for the combined records was 0.20, resulting in estimates between 0.21 and 0.30 on the liability scale when assuming a disease prevalence from 10% to 30% (Holzhauer et al., 2006). Similarly, the heritability observed for interdigital hyperplasia was 0.37, resulting in a heritability of 0.39 on the liability scale when assuming a prevalence of 1.3% (Solano et al., 2016).

Although genome-wide association and regional heritability mapping analyses revealed several QTLs independently, four QTLs were commonly reported by both approaches (**Tables 2** and **3** and **Supplementary Figure 1**). On chromosome 3, a suggestive region was associated with digital dermatitis in farm records, explaining 32.90% of the total genomic variance and being also suggestive of interdigital hyperplasia in the combined records. Two potential gene candidates are contained within this region: i) *FPGT* (fucose-1-phosphate guanylyl transferase) part of the L-fucose pathway, a key sugar in complex carbohydrates involved in cell-to-cell recognition, inflammation, and immune processes (Becker and Lowe, 2003); and ii) *TNNI3K* (serine/threonineprotein kinase TNNI3K), also associated with inflammation mechanisms (Wiltshire et al., 2011). Based on this function, we surmise that a candidate gene for lameness resistance may be found within this QTL.

On chromosome 23, a significant region was associated with interdigital hyperplasia in farm records, explaining 43.90% of the total genomic variance. Interdigital hyperplasia, also known as interdigital fibroma (Atkinson, 2013), results in a thickening of interdigital connective tissue causing fibroid tumours. Thus, a potential candidate gene found within this region is *EDN1* (endothelin-1), a vasoconstrictor associated with several cardiovascular diseases and inflammatory and fibrotic processes (Matsushima et al., 2004), acting as fibroblast mitogen in systemic sclerosis (Vancheeswaran et al., 1994), pulmonary fibrosis (Hocher et al., 2000), and hepatic fibrosis (Rockey and Chung, 1996).

On chromosome 3, another suggestive region was associated with interdigital hyperplasia using combined records, explaining 10.53% of the total genomic variance. This region includes several potential candidate genes, particularly *PHGDH* (D-3 phosphoglycerate dehydrogenase), an oxidoreductase that has been associated previously with pulmonary fibrosis (Hamanaka et al., 2017).

On chromosome 2, a suggestive region was associated with sole haemorrhage from the combined records, explaining 11.29% of the total genomic variance. With sole haemorrhage being related to impaired vascular system and cellular inflammatory reactions (Ossent and Lischer, 1998), a potential candidate gene within this region is *GPR17* (uracil nucleotide/ cysteinyl leukotriene receptor). This gene is as a sensor molecule involved in traumatic, vascular, and inflammatory pathologies in the central nervous system (Boda et al., 2011), and is also related to vascular permeability and inflammatory processes as a regulator of the cysteinyl leukotriene 1 receptor response (Maekawa et al., 2009).

Most candidate genes within the detected genomic regions are related either to inflammatory processes or fibroblast proliferation, as expected due to the nature of the analysed traits. However, these are not independent processes, but linked networks where fibroblasts present complex biosynthetic pathways, playing a role in pathogenesis and mediating inflammatory processes through their proliferation (Smith, 2005). Thus, the analysed traits are expected to present a complex genomic architecture with several genes and pathways involved in their phenotypic expression. This is concordant with the possible overestimation of the SNP and regional effects due to the Beavis effect (Xu, 2003) as well as with the lack of consistency across QTLs detected in several studies (Buitenhuis et al., 2007; Scholey et al., 2012; Swalve et al., 2014; Malchiodi et al., 2018). Therefore, it is expected that increasing the sample size with additional accurate records will increase the accuracy of the estimates of these effects.

DCT is a novel trait analysed using genomic data for the first time in the present study. Previous studies have shown an association between lameness-related diseases such as sole ulcer and white line disease with a thinner digital cushion, indicating also a potential change in the tissue composition of the cushion (Bicalho et al., 2009). In our study, given the relatively small number of samples available, no significant or suggestive markers were identified. However, a significant genomic effect was detected for DCT at calving, providing a moderate genomic heritability of 0.23 ± 0.12. When compared with the heritability of 0.33 obtained in a previous pedigree-based study (Oikonomou et al., 2014), our estimate was smaller but within the standard error boundaries.

As with the lameness-associated lesions, the genomic architecture of DCT traits is expected to be polygenic, being particularly related with body fatty acid and lipid metabolism. Further studies with an increased sample size will refine the heritability estimates and provide some potential candidate genes associated with this structure.

#### CONCLUSION

Four genomic regions were identified for digital dermatitis, interdigital hyperplasia, and sole haemorrhage, harbouring genes involved in inflammatory and fibroblastic processes. These traits are moderately heritable and potentially associated with a polygenic architecture. Therefore, the identification of associated regions may be useful to inform genomic selection programmes against lameness and to increase our knowledge of the underlying pathology.

In addition, this is the first study to address DCT from a genomic perspective, showing a moderate genomic heritability for this structure during the period of calving. The genomic architecture of this trait warrants further research attention.

#### DATA AVAILABILITY STATEMENT

The genotype data has been uploaded to a public repository hosted by the University of Edinburgh. Genotypes are therefore publicly available and can be obtained from: Edinburgh DataShare (University of Edinburgh), https://datashare.is.ed. ac.uk/handle/10283/3409.

#### ETHICS STATEMENT

Ethical approval for the study was granted by the University of Liverpool Research Ethics Committee. ASPA regulated procedures were conducted under a Home Office License (Reference Number: PPL 70/8330).

#### AUTHOR CONTRIBUTIONS

ES-M analyzed the data and co-wrote the first draft of the manuscript. VB collected the data, performed laboratory work, analyzed the data, and co-wrote the first draft of the manuscript. RS provided funding and critically evaluated the manuscript. GO and GB secured funding, designed and supervised the study, and critically evaluated the manuscript.

#### FUNDING

This study was funded by the Academy of Medical Sciences. VB acknowledges support from The Turkish Ministry of Education, and Ministry of Food, Agriculture, and Livestock. GO gratefully acknowledges support from the Wellcome Trust. GB and ES-M gratefully acknowledge funding from The Roslin Institute Strategic Programme (ISP) grants, and GB also the Rural and Environment Science and Analytical Services Division of the Scottish Government.

#### REFERENCES


#### ACKNOWLEDGMENTS

Bethany Griffiths, Lara Robinson, Nick Britten, Hannah Tatham, Rebecca Jenkin, and Nikos Kakatsidis are gratefully acknowledged for their assistance during data collection. Dr George Wiggans is gratefully acknowledged for converting the Affymetrix genotypes into an Illumina chip format.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00926/ full#supplementary-material

SUPPLEMENTARY FIGURE 1 | Genome-wide association (top) and regional heritability mapping (bottom) results by traits: Figure shows the Manhattan plots for each one of the traits, where concordant regions between both analyses were observed (A) digital dermatitis for farm records; (B) interdigital hyperplasia for farm records; (C) interdigital hyperplasia for combined records and; (D) sole haemorrhage for combined records]. Red lines correspond to the genome-wide significant threshold, and blue lines correspond to the genomewide suggestive threshold.


**Conflict of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2019 Sánchez-Molano, Bay, Smith, Oikonomou and Banos. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

digital media

of impactful research

article's readership