Genome-Wide Association Study Reveals Candidate Genes for Growth Relevant Traits in Pigs

Improvement of the growth rate is a challenge in the pig industry, the Average Daily Gain (ADG) and Days (AGE) to 100 kg are directly related to growth performance. We performed genome-wide association study (GWAS) and genetic parameters estimation for ADG and AGE using the genomic and phonemic from four breed (Duroc, Yorkshire, Landrace, and Pietrain) populations. All analyses were performed by a multi-loci GWAS model, FarmCPU. The GWAS results of all four breeds indicate that five genome-wide significant SNPs were associated with ADG, and the nearby genomic regions explained 4.08% of the genetic variance and 1.90% of the phenotypic variance, respectively. For AGE, six genome-wide significant SNPs were detected, and the nearby genomic regions explained 8.09% of the genetic variance and 3.52% of phenotypic variance, respectively. In total, nine candidate genes were identified to be associated with growth and metabolism. Among them, TRIB3 was reported to associate with pig growth, GRP, TTR, CNR1, GLP1R, BRD2, HCRTR2, SEC11C, and ssc-mir-122 were reported to associate with growth traits in human and mouse. The newly detected candidate genes will advance the understanding of growth related traits and the identification of the novel variants will suggest a potential use in pig genomic breeding programs.


INTRODUCTION
Growth rate is a vital economic trait and significantly affected pig production (Fontanesi et al., 2014;Ding et al., 2018). It is usually measured by Average Daily Gain (ADG), which is average daily weight gain within a certain period as well as Age (AGE) that adjusted to a certain weight. ADG and AGE are directly related to the pig growth and commonly used in pig breeding. Both traits are in moderate heritability and can be efficiently selected by modern breeding techniques (Hoque et al., 2007).
Many growth traits relevant candidate genes were identified since the development of sequencing technology. Up to now, 609 Quantitative Trait Loci (QTLs) were reported that associated with ADG and AGE (Hu et al., 2016). Previous researches have shown a series of candidate genes of ADG and AGE. XIRP2 and MC4R gene were reported that associated with both ADG and residual feed intake (RFI) in pure Duroc population (Onteru et al., 2013;Do et al., 2014). The SOGA1 gene and ZFPM2 gene were identified to associate with both ADG and AGE in pure Duroc population (Do et al., 2014;Meng et al., 2017). The IGSF3 and IGF2 gene were detected to associated with ADG in an Italian Large White pig population (Van Laere et al., 2003;Fontanesi et al., 2014).
The genetic architectures of growth traits are usually complex and generally controlled by multiple gene. With the developments of cost effective genotyping technology, Genomewide association study (GWAS) has been widely used for mapping candidate genes of complex traits (Cantor et al., 2010;Korte and Farlow, 2013). Mixed linear model (MLM), which simultaneously incorporates principal components as fixed effects and individual additive effects as random effects, has become one of the most popular models used in GWAS . Multiple algorithms have been developed to boost both the computational efficiency and statistical power of MLM methods (Kang et al., 2008;Zhou and Stephens, 2012). However, previous studies indicated that the confounding problem decreased the power of MLM for detecting candidate genes that associated with population structure, especially in a population with explicit population structure. Many GWAS analyses for growth traits were carried out in single breed with limited sample size . Therefore, the candidate genes that contributed to the variation among breeds were always missed. Our previous study introduced FarmCPU method, which is a multiple loci model and split the MLM into separated fixed effect model and random effect model and iteratively uses above two models (Liu et al., 2016). A series of studies in both livestock and plant indicated that FarmCPU detected more candidate genes by solving the confounding problem (Kaler et al., 2017;Kusmec et al., 2017;Meng et al., 2017;Wang et al., 2018).
In this study, GWAS was conducted in a population of 4,865 pigs composed of four pure breeds (Duroc, Yorkshire, Landrace, and Pietrain) for mapping candidate genes underlying ADG and AGE traits. Estimation of genetic parameters, such as heritability was also carried out. The detected candidate genes, potential breeding markers, and the better understanding on genetic architectures of ADG and AGE will benefit breeding programs.

Animals and Phenotypes
Animals used in this study were from two isolated farms, and composed of four pure breeds, including Duroc, Yorkshire, Landrace, and Pietrain (Supplementary Table S1). ADG and AGE were measured from 70 to 115 kg, and then adjusted to 100 kg. AGE was adjusted to 100 kg using formula below: In total, 31,173 phenotypic observations from farm 1 and 24,374 phenotypic observations from farm 2 were recorded. Tail and ear tissue samples were collected and preserved with 75% alcohol and were stored in −20 • C freezers.

Genotyping, Imputation, and Quality Control
Genomic DNA was extracted from frozen collected ear tissue samples using Tecan Freedom EVO NGS workstation and TIANGEN magnetic animal tissue genomic DNA kit. DNA samples with concentration ≥40ng/µl, quantity ≥ 1 µg, and passed gel electrophoresis test were used for genotyping. A total of 4,865 DNA samples, including 1,595 samples from farm 1 and 3,270 samples from farm 2 were genotyped by Illumina PorcineSNP50 Bead Chip, which includes 50,697 SNPs, and all SNP markers were mapped to Sus scrofa genome build 11.1 (Ramos et al., 2009). The data in PLINK binary format is available at https://figshare.com/articles/pig-growth-data_zip/7533020.
Missing genotype data were fully imputed by FImpute version 2.2 (Sargolzaei et al., 2014). Data quality control was performed by PLINK and G2P software before and after imputation (Chang et al., 2015;Tang and Liu, 2019). The data was filtered by genotype call rate >0.9, Minor Allele Frequency (MAF) < 0.01, and individuals with SNP marker call rate <0.9 and missing phenotypic records were also removed. The remaining data was used for the subsequent analysis (Supplementary Table S1).

Estimation of Genetic Parameters
Genetic parameters, including genetic variance, residual variance, and heritability were estimated using Average Information -Restricted Maximum Likelihood algorithm (AI-REML) in an R procedure (Gilmour et al., 1995). The parameters were estimated by three methods, including pedigree-based Best Linear Unbiased Prediction (aBLUP), genomic BLUP (gBLUP), and single-step BLUP (ssBLUP) (Henderson, 1975;VanRaden, 2008;Aguilar et al., 2010). All BLUP models can be written as: where y is a vector of phenotypic observations; b and u represent fixed effects and breeding values, respectively; X and Z were design matrices for b and u, respectively; e represents the residual error vector with a normal distribution of e∼N(0,Iσ 2 e ), where I was an identity matrix and σ 2 e is residual variance. In aBLUP model, u∼N(0,Aσ 2 u ), in which σ 2 u is additive genetic variance and A is an additive genetic relationship matrix that derived from pedigree records; In gBLUP model, A is replaced by G and u∼N(0,Gσ 2 u ), G is derived from genomic information and can be constructed by VanRaden method: where M is a m×n standardized genotype matrix, m is the marker size and n is the number of genotyped individuals, p i is the minor allele frequency of the i th genetic marker; In ssBLUP method, u∼(0,Hσ 2 u ) and the relationship matrix H is derived from both pedigree records and genomic information simultaneously using the following equation: Individuals are assigned to different groups based on available information. The group with footer 1 represents individuals that only have pedigree information and group with footer 2 represents individuals that have both pedigree and genomic information. The A 11 and A 22 represent the relationship among individuals within group 1 and group 2, respectively, A 12 represents the relationship among individuals between group 1 and group 2 and A 21 is the transpose of A 12 , α is the ratio for combing G and A 22 matrix and set to 0.05 in this study.

Genome-Wide Association Study
Association tests were performed by a multi loci model, FarmCPU (Liu et al., 2016). FarmCPU model iteratively uses fixed effect model and random effect model. The top three columns of principal components, sex, farm effects, and pseudo QTNs (Quantitative Trait Nucleotides) were added as covariates in the fixed effect model for association tests and the model can be written as: where y is phenotypic observation vector; P is a matrix of fixed effects, including top three principal components, sex, and farm effects; M t is the genotype matrix of t pseudo QTNs that used as fixed effects; b p and b t are the relevant design matrices for P and M t , respectively; S i is the ith marker to be tested and d j is the corresponding effect; e is the residual effect vector and e∼N(0,Iσ 2 e ). Random effect model is used for selecting the most appropriate pseudo QTNs. The model is written as: y=u+e where y and e stay the same as in fixed effect model; u is the genetic effect and u∼N(0,Kσ 2 u ), in which K is the relationship matrix that defined by pseudo QTNs.
The Bonferroni correction threshold for multiple tests was used for detecting the genome-wide significant SNPs, which defined as α/K (α = 0.05 and K is the number of SNPs) (Nicodemus et al., 2005).

Variance Explained by Candidate Regions
Candidate region is defined as the genomic region that located within 1 Mb upstream and downstream of the genomewide significant SNPs. The proportion of genetic variance and phenotypic variance explained by candidate regions are estimated in a mixed linear model with multiple random effects. The model can be written as: where y, Xb, and e are the same as described in 2.3. u i represents the ith random effect and u i ∼N(0,k i σ 2 ui ), in which k i and σ 2 ui represent the variance covariance matrix and variance of the ith random effect, respectively. In this study, k 1 is a relationship matrix that defined by all SNPs within the candidate regions and k 2 is constructed by the rest of SNPs. The ratio between σ 2 u1 and the sum of σ 2 u1 and σ 2 u2 is defined as the proportion of genetic variance explained by candidate regions. The proportion of phenotypic variance explained by candidate regions can be calculated as the ratio between σ 2 u1 and the phenotypic variance. Additionally, the variance explained by randomly selected genomic regions was recognized as null distribution and compared with the variance explained by the candidate regions.

Identification of Candidate Genes
The candidate genes nearby the genome-wide significant SNPs were identified by Ensembl database using the gene annotation information of Sus scrofa genome version v11.1 (www. ensembl.org/biomart/). The genomic locations were downloaded from https://www.animalgenome.org/pig/ and genes within the candidate region are considered as candidate genes.

Summary Information of Phenotype Data, Genotype Data, and Population Structure
Summary statistics of both ADG and AGE were analyzed and shown in Table 1 and Supplementary Table S2

Estimation of Genetic Parameters
Genetic variance, residual variance, and heritability of ADG and AGE that adjusted to 100 kg on two farms were estimated by AI-REML using three models for data of all four breeds. Heritability was calculated by dividing genetic variance with the sum of genetic variance and residual variance. All estimated genetic parameters are shown in Table 2. The heritability estimations by the three methods are ranged between 0.408 and 0.562, and 0.444 and 0.572 for ADG and AGE, respectively. The results indicate that the heritabilities of ADG and AGE in two farms are similar and Farm2 has higher phenotypic variation compared with Farm1.

GWAS Results of ADG and AGE for All Breeds
For ADG trait, 5 genome-wide significant SNPs were detected and the candidate regions are located on chromosome 1, 3, 6, 10, and 13 (Figure 1). Linkage disequilibrium (LD) analysis was conducted using pooling data of the four breeds and the LD decay is shown in Supplementary Figure S5. The results indicate that LD decay tends to be stable when the distance is 1Mb. Therefore, genes that located within 1 Mb nearby the significant SNPs are defined as candidate genes. The detailed information of top ten significant SNPs, including SNP ID, Chromosome (Chr), Physical positions, P-value, and candidate genes are provided in Table 3.
We consider the genomic region that located within 1 Mb distance on either side of the significant SNPs as candidate region. For ADG, the five candidate regions explained 4.08 and 1.90% of the genetic and phenotypic variance, respectively. In contrast, a randomly selected five regions across the whole genome repeated for 50 times, on average, five randomly selected genomic regions could only explain 0.663 and 0.311% of the genetic and phenotypic variance, respectively. The detailed information is shown in Table 4. Six genome-wide significant SNPs were detected in AGE trait and detailed information of top ten significant SNPs is shown in Table 5. These candidate genomic regions explained 8.09 and 3.52% of the total genetic and phenotypic variance for AGE, respectively. In comparison, a randomly selected six regions across the whole genome repeated for 50 times, on average, can only explained 0.669 and 0.304% of the genetic variance and phenotypic variance (Table 4), respectively. Due to the genetic correlation between ADG and AGE, two SNPs that located on chromosome 1 and chromosome 6 are detected in both traits and marked with dotted line in Figure 1.

GWAS Results of ADG and AGE for Single Breed
GWAS analyzes for single breed were also carried out. Six genome-wide significant SNPs were detected for both ADG and AGE traits by Duroc population (Figure 2

DISCUSSION
Most target economic traits in livestock are quantitative traits and usually with a complex genetic architectures. Therefore, revealing the candidate genes underlying these traits is always the attractive area of research in livestock genetics and breeding (Goddard and Hayes, 2009;Hou and Zhao, 2013). Since the GWAS research on age-related macular degeneration of the retina was published, GWAS has become one of the most popular method for identifying candidate genes that associated   (Klein et al., 2005). With the developments of commercialized high density SNP Chips for multiple livestock species, the GWAS has been widely used in livestock research and breeding (Zhang et al., 2012;Wang et al., 2017). In this study, the ADG and AGE traits of pigs from two separated farms were obtained. A population of 5,000 individuals of four pure breeds were genotyped by 50K SNP Chip. For all breeds, a total of 4,260 individuals with 47,157 SNPs were used for association tests using the FarmCPU method, which is a multi-loci GWAS model. For both ADG and AGE, nine genome-wide significant SNPs were detected using data of all four breeds, including two common detected signals. The candidate regions were evaluated in a mixed linear model with multiple random effects for the variance contribution and explained about 4-8% of the genetic variance and 2-3% of the phenotypic variance. ADG and AGE are perfectly correlated after take the log transformation. A trait and its inverse values or log transformation values take different assumptions for the genetic architectures. For a trait with complex genetic architecture, common used hypothesis assume that the effects of genes are linear. However, there exist some restrictions of the phenotype and some genes that don't directly affect the trait may also restrict its maximum or minimum. The effects of genes are not always equally contributed to every level of the trait and the effects of genes are non-linear sometimes. The linear assumption fits perfectly near the average value of traits, but it may lose its fitness near the boundary of the trait. The non-linear assumption utilizes the information near boundary more effectively. Due to the inverse relationship of ADG and AGE, the analyzes on both traits seem like to use different modeling way to detect the candidate genes underlying the pig growth rate. From the results, we can also see that ADG and AGE have some common detected signals and some disagreements were also found as well.
The heritabilities of both ADG and AGE traits were estimated by three models including aBLUP, gBLUP and ssBLUP using the pooling data of the four breeds. The heritabilities of ADG and AGE traits are 0.41∼0.56 and 0.44∼0.57, respectively. Previous research also reported that ADG is a moderate heritability trait and its heritability is approximately 0.45∼0.49 in a Duroc population (Gilbert et al., 2007;Hoque et al., 2009). Comparing the heritabilities that estimated by three models, the standard error of gBLUP model is higher than the  other two models because of the small genotyped population size (Gutierrez et al., 2018). The ssBLUP method utilizes the pedigree and genomic information simultaneously and the estimated genetic parameters are theoretically more accurate (Meuwissen et al., 2016).
Using data of all breeds, there are two SNPs detected in both ADG and AGE traits, which are WU_10.2_1_179575045 on chromosome 1 and Affx-114707063 on chromosome 6. GRP, SEC11C (SEC11 homolog C), and ssc-mir-122 were identified in the nearby candidate regions on chromosome 1. GRP is a regulatory neuropeptide that stimulates gastric G cells to release gastrin and regulate gastric acid secretion and intestinal movement, which affects the food intake and may lead to anorexia, bulimia, and obesity if lacked of the genes (Merali et al., 1999). SEC11C is a homolog of SEC11, which is the only essential factor for signal peptide processing and plays an important role in protein processing, localization, and secretion. The lack of SEC11 will cause serious growth defects (Bohni et al., 1988). The sscmir-122 is a highly evolutionarily conserved miRNA and it is an important regulator of lipid metabolism (Esau et al., 2006). A previous experiment shows that the weight and cholesterol levels are increased with the decreased expression of ssc-mir-122 by feeding high-level cholesterol feed to mini pigs (Cirera et al., 2010). Previous published researches also verified that MC4R FIGURE 3 | Manhattan plots and Quantile-Quantile (QQ) plots for both ADG and AGE traits of Landrace breed. The signals of genome-wide significant SNPs are colored in red.
FIGURE 4 | Manhattan plots and Quantile-Quantile (QQ) plots for both ADG and AGE traits of Yorkshire breed. The signals of genome-wide significant SNPs are colored in red. and GTF3C5 that located on chromosome 1, are candidate genes for growth traits (Onteru et al., 2013;Quan et al., 2018). For Affx-114707063 SNP on chromosome 6, the identified nearby candidate gene is TTR, which is the main carrier of thyroid hormone (T4) and plays an important role on growth and energy metabolism (Richardson et al., 2007). The concentration of TTR in the blood can be used to assess nutritional status. Because of its short half-life, the concentration can better reflect the nutritional levels of recent dietary intake (Shenkin, 2006). Some researches verified that PGM1 (Phosphoglucomutase-1) and FTO (Fat mass and obesity-associated protein) that located on chromosome 6 significantly affected muscle development and obesity, respectively (Onteru et al., 2013;Fontanesi et al., 2014).
The CNR1 gene located on chromosome 1, which next to the significant SNP MARC0065740, is an appetite-related gene and appetite is one of the important factors influencing growth rate. CNR1 is the major receptor of anandamide that inhibits gastrointestinal activity (Mathison et al., 2004). The CNR1 receptor inverse agonist rimonabant has been proved to reduce the food intake in both human and mouse. In addition, when the stomach is in contraction, which is a relatively active state, CNR1 will promote the release of ghrelin, which increases the palatability of food, and it is the origin of the appetite stimulating effect (De Luca et al., 2012).
The SNP density limits the interpretation of GWAS hits due to the different LD between SNP and QTLs in various breeds. Therefore, GWAS was carried out on data of four breeds separately as well. Two signals that detected in Duroc population were consisted with GWAS results of all breeds and explained the variability of the traits among breeds and within Duroc breed. In addition, several genome-wide significant SNPs were detected and contributed to the variability of traits within breeds. For Duroc breed, the GLP1R gene is identified nearby the genomewide significant SNP MARC0015804. It is a member of the glucagon receptor family and controls the blood sugar level (Brubaker and Drucker, 2002), in addition, it is also expressed in the brain and has a role in appetite control (Kinzig et al., 2002). The TRIB3 gene, which is identified nearby the M1GA0027226 marker on chromosome 17, was reported to be associated with meat production traits in Italian heavy pigs (Fontanesi et al., 2010). For Landrace breed, BRD2 and HCRTR2 genes were identified nearby M1GA0027226 SNP on chromosome 7. The knockout of BRD2 gene will cause obesity in mouse and HCRTR2 is a kind of orexin receptor, which plays a vital important role in feeding behavior and balance of energy metabolism (Spinazzi et al., 2006;Belkina and Denis, 2012).

CONCLUSIONS
In this study, we performed GWAS on approximately 5,000 purebred pigs that composed of four breeds from two separated farms. All samples were genotyped by 50K SNP Chip. GWAS was performed on ADG and AGE that adjusted to 100 kg using FarmCPU model. A total of 27 genome-wide significant SNPs were detected and two of them were detected in both traits using the data of all breeds and Duroc breed. Nine candidate genes were detected within 1 Mb nearby the genomewide significant SNPs. Among them, TRIB3 on chromosome 17 was reported to be associated with meat production traits in pigs; GRP, CNR1, SEC11C, and ssc-mir-122 on chromosome 1, TTR on chromosome 6 and GLP1R, BRD2, and HCRTR2 gene on chromosome 7 were reported to be associated with growth traits in human and mouse. The newly detected significant SNPs and newly identified candidate genes in this study can be applied to pig breeding and the information could be also incorporated in genomic selection for ADG and AGE traits to achieve faster growth.

ETHICS STATEMENT
This study was carried out in accordance with the guidelines of the science ethics committee of the Huazhong Agricultural University (HZAU). The science ethics committee of the HZAU approved all the animal experiments that conducted in this study.