Identification of SNPs and Candidate Genes for Milk Production Ability in Yorkshire Pigs

Sow milk production ability is an important limiting factor impacting suboptimal growth and the survival of piglets. Through pig genetic improvement, litter sizes have been increased. Larger litters need more suckling mammary glands, which results in increased milk from the lactating sow. Hence, there is much significance to exploring sow lactation performance. For milk production ability, it is not practical to directly measure the milk yield, we used litter weight gain (LWG) throughout sow lactation as an indicator. In this study, we estimated the heritability of LWG, namely, 0.18 ± 0.07. We then performed a GWAS, and detected seven significant SNPs, namely, Sus scrofa Chromosome (SSC) 2: ASGA0010040 (p = 7.73E-11); SSC2:MARC0029355 (p = 1.30E-08), SSC6: WU_10.2_6_65751151 (p = 1.32E-10), SSC7: MARC0058875 (p = 4.99E-09), SSC10: WU_10.2_10_49571394 (p = 6.79E-08), SSC11: M1GA0014659 (p = 1.19E-07), and SSC15: MARC0042106 (p = 1.16E-07). We performed the distribution of phenotypes corresponding to the genotypes of seven significant SNPs and showed that ASGA0010040, MARC0029355, MARC0058875, WU_10.2_10_49571394, M1GA0014659, and MARC0042106 had extreme phenotypic values that corresponded to the homozygous genotypes, while the intermediate values corresponded to the heterozygous genotypes. We screened for flanking regions ± 200 kb nearby the seven significant SNPs, and identified 38 genes in total. Among them, 28 of the candidates were involved in lactose metabolism, colostrum immunity, milk protein, and milk fat by functional enrichment analysis. Through the combined analysis between 28 candidate genes and transcriptome data of the sow mammary gland, we found nine commons (ANO3, MUC15, DISP3, FBXO6, CLCN6, HLA-DRA, SLA-DRB1, SLA-DQB1, and SLA-DQA1). Furthermore, by comparing the chromosome positions of the candidate genes with the quantitative trait locus (QTLs) as previously reported, a total of 17 genes were found to be within 0.86–94.02 Mb of the reported QTLs for sow milk production ability, in which, NAV2 was found to be located with 0.86 Mb of the QTL region ssc2: 40936355. In conclusion, we identified seven significant SNPs located on SSC2, 6, 7, 10, 11, and 15, and propose 28 candidate genes for the ability to produce milk in Yorkshire pigs, 10 of which were key candidates.


INTRODUCTION
The mammary gland is a ubiquitous morphological feature of mammals, and lactation is an essential process in mammalian reproduction, including the secretion of milk from mammary glands. For offspring, depending on milk is a key strategy to the life history of all mammals. During lactation, maintaining body growth and milk production for the dam is necessary, thus energy requirement is high. In the past few decades, genetic and management changes have occurred, and the modern sow is subject to additional challenges. Litter size is one of the most important factors affecting milk production in a sow (Eissen et al., 2000), and piglet survival after birth is negatively affected by increasing litter size . During this period, the litter size of pigs has increased and will continue as an important goal trait in pig breeding programs around the world (Spötter and Distl, 2006;Baxter et al., 2013). In general, larger litters need more suckle mammary glands, which results in increased milk from the lactating sow (Auldist et al., 1998). The survival of offspring can be enhanced by milk yield, which satisfies the immunological needs of offspring and assists in the endocrine maturation of neonates (Goldman, 2002). In response to greater suckling intensity, sows have to produce more milk to nurse more piglets (Auldist et al., 1998;Revell et al., 1998). Additionally, poor lactation traits lead to early culling, which affects the profitability of commercial producers. Hence, it is of economic importance to improve lactation performance in pigs, and it is necessary to include lactation traits in the breeding goals.
The genetic improvement of sow lactation performance is hindered due to the difficulty of collecting accurate phenotypes. Unlike dairy cattle, it is not possible to directly measure the sow milk yield. Different experimental methods have been proposed to measure pig milk production ability, such as the isotope dilution method (Pettigrew et al., 1987) and the weighsuckle-weigh method (Elsley, 1971). These methods are expensive, complicated, and laborintensive, and are difficult to be implemented on a routine basis in a commercial herd. A simpler and more straightforward measurement for an increase in body weight of piglets during lactation has been reported and is considered as an indicator trait for milk production ability (Revell et al., 1998;Bergsma et al., 2008). In 2016, DM. Thekkoot et al. estimated the heritability of litter weight gain (LWG) as an indicator of lactation trait in Yorkshire and Landrace sows, namely 0.16-0.22 and 0.12-0.20, respectively (Thekkoot et al., 2016a).
A Genome-Wide Association Study (GWAS) is an effective strategy to examine the underlying genetics of complex traits (Goddard and Hayes, 2009). Many studies have identified candidate markers associated with important economic traits in pigs, such as meat quality (Falker-Gieske et al., 2019) and growth (Zhang et al., 2019). For LWG traits in Yorkshire sow lactation, the GWAS detects two quantitative trait locus (QTLs) on Sus scrofa Chromosome (SSC) 7 (126 and 101 Mb) (Thekkoot et al., 2016b).
Until now, there has been little known about the heritability and genomic prediction of sow milk production ability. In this study, we aimed to estimate the heritability of LWG of the sow during lactation, to perform a GWAS for proposing the single nucleotide polymorphisms (SNPs) and candidate genes, and to conduct the combined analysis with the reported swine mammary gland transcriptome data and GWAS data for further insights into the candidates involved in sow milk synthesis.

Animals and Phenotypic Data
In this study, a total of 985 Yorkshire sows involved in 96 sire families, were recorded between 2019 and 2020 in Shanxi and Liaoning Province, China. These sows were fed with the fodders prescribed by their farms, in which, the regular quarantine inspection was carried out. For each sow, only one production record was performed, and 985 individuals were involved in 1-8 parity.
As it was not practical to directly measure the milk production ability of sows, our study weighed all non-mummified piglets at birth, death, weaning, and at the time of fostering. This allowed us to quantify the exact weight gain of each piglet for each sow. We calculated the LWG for each sow by summing up the increase in weight of all piglets nursed by that sow and considered it as a potential indicator for milk production ability. The formula for calculating LWG was as follow:

Genetic Parameters Estimation for LWG
We estimated the genetic parameters of LWG with an animal model. The genetic parameters and estimated breeding values (EBV) were performed by the ASReml package as the following model: where y is a vector of phenotypic records (LWG of the sow); b is a vector of fixed effects containing herd by farm and production batch (nine levels), parity (five levels: 1, 2, 3, 4, and 5-8), and days of lactation (three levels: ≤ 18, 19-21, and > 21); X is a design matrix that associates b with y; a is the vector of additive genetic effects; Z is the corresponding incidence matrix, and e is the vector of random residual effects. Variances of random effects are defined as V(a) Gσ 2 a for the polygenes and V(e) Iσ 2 e for the residuals, where the G is the additive genetic relationship matrix, I is the identity matrix, V(a) is the additive genetic variance, and V(e) is the residual variance. In this study, 985 sows were traced back to four-generation pedigrees to construct the kinship matrix, and a total of 2,415 individuals were included.

Genotyping and Quality Control
Ear samples of the 985 Yorkshire sows were collected in farms. For each ear, DNA was isolated with a commercially available kit, Q1Aamp DNA Mini Kit (QIAGEN, Germany). In total, 985 sows were then genotyped with the GenSeek Genomic Profiler (GGP) Porcine 50K (50,697 SNPs, Illumina, San Diego, CA, United States).
With PLINK (Purcell et al., 2007), we removed the SNPs with minor allele frequencies < 0.01, and a deviation from Hardy-Weinberg equilibrium (HWF) p values < 0.001. A dataset containing 36,871 SNPs and 985 animals was used for further analysis. All SNP positions were annotated based on pig genome assembly Sscrofa 11.1. The genotype data used for GWAS was submitted to public repositories, and the DOI was 10.6084/m9. figshare.16545915 (https://figshare.com/s/ edda38a1c99aa7ab7ae0).

Genome-Wide Association Study
We utilized the EBV of LWG as the dependent variable to perform GWAS by Fixed and random effect model Circulating Probability Unification (FarmCPU). FarmCPU is a multi-locus model that incorporates multiple markers simultaneously as covariates to partially remove the confounding effect between testing markers and kinship (Liu et al., 2016). A genome-wide Bonferroni correction threshold of 0.05/36,871 (i.e., 1.36E-06) was implemented to correct for multiple testing and assess the significance level for each SNP. The Manhattan and quantilequantile (Q-Q) plots were drawn by R packages (http://cran.rproject.org/web/packages/gap/index.html).
In addition, we estimated the least square mean of sow LWG phenotypes for homozygous and heterozygous genotypes of the seven significant SNPs with standard error (SE) by SAS9.2 (SAS Institute, Cary, NC, United States).

Gene Contents and Functional Annotation
We used the BioMart in Ensembl database to retrieve candidate genes within 200 kb (Zhao et al., 2011) of significant SNPs based on the pig reference genome (Sscrofa11.1). To provide insight into the functional enrichment of candidate genes identified in this study, we performed gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis with the KOBAS (http://kobas.cbi.pku.edu.cn/kobas3/genelist/) (Xie et al., 2011).

Combined Analysis With the Reported Transcriptome and GWAS Data
To further confirm the key candidates, we performed the combined analysis between the results of this study and reported transcriptome research of the sow mammary gland (Palombo et al., 2018).
Based on the gene location information in the Ensembl database (http://asia.ensembl.org/index.html) and reported GWAS, it was considered that the candidate genes located within 5 Mb to the peak of QTLs in the previous GWAS were promising candidates associated with the ability to produce milk.

Descriptive Statistics and Heritability of LWG Trait
For 985 Yorkshire pigs, the average days of lactation were 19.13. We calculated the descriptive statistics of LWG throughout lactation: number sows ∼ 985, mean ∼ 51.65 kg, standard deviation ∼ 16.05, maximum ∼ 98.74 kg, and minimum ∼ 5.54 kg. Figure 1 shows the distribution of LWG, which indicated the data was normal.
We estimated the heritability of sow LWG: 0.18 ± 0.07, in which, the estimated additive variance was V(a) 25.95 ± 10.85, and residual variance was V(e) 119.69 ± 10.30. Furthermore, we estimated the breeding value and include the results in Supplementary Table S1.
We performed the distribution of phenotypes for LWG by the genotype of the significant SNPs, the results of which can be seen in Figure 3. These data of ASGA0010040, MARC0029355, MARC0058875, WU_10.2_10_49571394, M1GA0014659, and MARC0042106 showed that the extreme phenotypic values corresponded to the homozygous genotypes, while the intermediate values corresponded to the heterozygous genotypes. The least-square mean (± SE) of the LWG by seven significant SNPs is shown in Table 2, which also presents the genotype and allele frequencies.
Sows that were homozygous AA for ASGA0010040 showed significantly lower LWG than those that were homozygous GG (p < 0.01) and heterozygous AG (p < 0.05). The homozygous AA for MARC0058875 showed significantly larger LWG than those with homozygous GG (p < 0.01) and heterozygous AG (p < 0.01). The homozygous AA for M1GA0014659 showed significantly larger LWG than those with homozygous GG (p < 0.05). Sows that were homozygous AA for WU_10.2_6_65751151 and AG for MARC0042106 showed significantly larger milk production ability than those that were heterozygous AG (p < 0.01) and homozygous AA (p < 0.05), respectively. The SNPs MARC0029355 and WU_10.2_10_49571394 were not significant, while the homozygous GG for MARC0029355 and AA for WU_10.2_10_49571394 had obvious larger LWG than those with homozygous AA and GG, respectively. These results further confirmed that the seven SNPs were highly associated with sow milk production ability.

Functional Analysis of Candidate Genes
To investigate the functions of 38 genes, we performed GO and KEGG pathway analysis by KOBAS. In total, 142 GO and 51 KEGG enrichments were clustered with 28 genes (Supplementary Table S2). All these GO and KEGG enrichments were mainly related to cellular components and basic metabolism. In which, many GO and KEGG enrichments were involved in lactose metabolism, colostrum immunity, and milk protein and fat, such as tetrahydrofolate interconversion, thermogenesis, oxytocin signaling pathway, antigen processing and presentation, primary immunodeficiency, immune system process, glycoprotein catabolic process, cGMP-PKG signaling pathway, fat cell differentiation, and MAPK signaling pathway (Supplementary Table S2). Additionally, there were also many important metabolism enrichments clustered by these genes, including chloride channel activity, ubiquitin-mediated proteolysis, regulation of cell growth, carbon metabolism, metabolic pathways, ATP binding, and oxidation-reduction process (Supplementary Table S2). According to the results of the GO and KEGG enrichments, we considered the 28 genes as candidates for lactose metabolism, colostrum immunity, and milk protein and fat (Supplementary Table S2).

Combined Analysis With the Reported Transcriptome of Swine Mammary Gland and GWAS Data of Sow Milk Production Ability
To further detect insights into the association of 28 candidate genes with milk synthesis, we performed the combined analysis between this GWAS and reported transcriptome data (Palombo et al., 2018) to improve the accuracy of the selection of functional genes related to milk production in swine. In total, nine (ANO3, MUC15, DISP3, FBXO6, CLCN6, HLA-DRA, SLA-DRB1, SLA-DQB1, and SLA-DQA1) of 28 candidates were differentially expressed genes at days 14, 10, 6, and 2 before (−) parturition and day 1 after (+) parturition (Table 3).
We also compared the chromosome positions of 28 candidates with those of the QTLs from reported GWAS data for milk production ability traits, and a total of 17 genes were found to be within 0.86-94.02 Mb of the reported QTLs for milk yield ( Table 3). In which, NAV2 was found to be located with 0.86 Mb of QTL region ssc2: 40936355 that was confirmed to have large genetic effects on sow milk yield ( Table 3).

DISCUSSION
In this study, we estimated the heritability and EBV of LWG and performed a GWAS to screen the candidate genes. We found 28 promising candidates involved in lactose metabolism, colostrum immunity, and milk protein and fat, such as tetrahydrofolate interconversion, primary immunodeficiency, glycoprotein catabolic process, fat cell differentiation, and MAPK signaling pathway.
Our heritability estimates for LWG were 0.18 and were consistent with those reported by DM. Thekkoot, who found the heritability of LWG ranged from 0.16 to 0.22 for Yorkshire and 0.12-0.20 for Landrace sows (Thekkoot et al., 2016a). We performed the GWAS and proposed seven significant SNPs associated with sow milk production ability. By the estimation of least-square means, ASGA0010040, MARC0058875, WU_10.2_10_49571394, M1GA0014659, and MARC0042106 were found that the extreme phenotypic values significantly corresponded to the homozygous genotypes. Sows that were genotyped for MARC0029355 and WU_10.2_10_49571394 had an obvious phenotype trend between two different homozygous, while not significant. This might be due to the high SE. The lactation process includes initiation and maintenance, which are mainly regulated by hormone-nerve. Milk production is highly influenced by the sow's body reserves at the start of lactation as well as the degree and type of body tissues that are mobilized during lactation (Costermans et al., 2020). Selection for high prolificacy in modern sows has led to increased litter size and a higher number of piglets weaned per litter, which results in greater metabolic demands during lactation, due to a higher milk production (Kemp et al., 2018). In our research, we found the candidate genes were enriched mainly in metabolism-related functions, especially in processes involving carbohydrates, ATP, lipids, and protein processes. In addition, we also found that these candidate genes were involved in colostrum immune processes and milk synthesis.
By the combined analysis with the swine mammary gland transcriptome data, nine genes were identified to be key candidates. By the combined analysis with the reported GWAS data, the NAV2 gene was found to be located with 0.86 Mb of the reported QTL region ssc2: 40936355. We comprehensively analyzed the results of functional enrichments, the swine mammary gland transcriptome, and previous GWAS data, which revealed that 28 candidate genes were associated with swine milk production, and 10 of them were key candidates.
For the 10 key candidate genes, NAV2 was mainly enriched into Na (+) channel (Mishra et al., 2015), nervous system development (Clagett-Dame et al., 2006;Yan et al., 2015;Pook et al., 2020), and delayed age of menopause among women (Bae et al., 2019). In all brain regions studied, the levels of NAV2 observed in late gestation and early postnatal life were the highest (Pook et al., 2020). It was reported that NAV2 was associated with hyperlipidemia (Sun et al., 2018a). ANO3 was associated with dystonia and motor neuron dysfunction (García-Hernández et al., 2021). The glycoprotein MUC15 was initially isolated from the bovine milk fat globule membrane and had a potential physiological function in signal transduction (Pallesen et al., 2008). MUC15 was involved in PI3K/AKT signaling pathway (Yue et al., 2020), and the localization of MUC15 was shown to be controlled by the ovarian hormones, oestrogen, and progesterone (Poon et al., 2014). DISP3 was a molecule between thyroid hormone and cholesterol metabolism, which used thyroid hormone to regulate serum cholesterol levels, thus participating in the metabolism and synthesis of various substances such as sugar, protein, fat, estradiol, and cortisol in the body (Zikova et al., 2009). DISP3 was also associated with the release of lipid-anchored secretory proteins (Katoh and Katoh, 2005). FBXO6 was related to ovarian cancer treatment (Ji et al., 2021) and glycoprotein quality control (Glenn et al., 2008). CLCN6 was involved in the renin-angiotensin-aldosterone system (Ji et al., 2017). SLA-DRA, SLA-DRB1, SLA-DQB1, and SLA-DQA1 were the SLA class Ⅱ genes involved in immune (Liu et al., 2015). SLC5A12 was an active source of lactate transmembrane transporter, which is mainly involved in sodium ion transport (Martin et al., 2007;Sivaprakasam et al., 2017). FBXO2 and MAD2L2 were involved in ubiquitination processes (Li et al., 2018;Liu et al., 2021), which regulated the milk protein and fat metabolic mechanism (Liu et al., 2020a). DRAXIN was related to Akt, which could impact milk synthesis (Meli et al., 2015;Liu et al., 2020b). AGTRAP was reported to have a functional role in adipose metabolism (Ohki et al., 2017). MTHFR was involved in the metabolism of carbon, methionine, and tetrahydrofolic acid, and was related to the metabolism of milk folic acid (Page et al., 2019). MTHFR could play a role in milk protein synthesis through folic acid (Hou et al., 2015). Studies reported that MTHFR was an important candidate gene for sheep milk yield traits (Hou et al., 2015;An et al., 2016). NPPB and BTNL2 were involved in PI3K/AKT, Ca 2+ , K + , ATP, and immunity (Fioretti et al., 2004;Dolovcak et al., 2009;Sun et al., 2018b;Zhao et al., 2020). KIAA2013 was related to DNA methylation levels of newborns (Yeung et al., 2021). HLA-DOB, PSMB8, and TAP1 were involved in immune, protein and fat metabolism processes (Nagarajan et al., 2002;Niesporek et al., 2005;Garg, 2011;Kolbus et al., 2012;Arimochi et al., 2016;Naderi et al., 2016;Moussa et al., 2018;Yang et al., 2018;Chen et al., 2020). CACNB2 was involved in the regulation of ion membrane transport, which was related to calcium channel activity, MAPK, and oxytocin signaling pathways (Durairaj Pandian et al., 2019), and studies have shown that CACNB2 was involved in the formation of porcine marlin (Bertolini et al., 2018). NSUN6 protein might have an important function in broad aspects of embryonic development (Chi and Delgado-Olguín, 2013). KIF5C was involved in the regulation of mammalian phosphorylation (Padzik et al., 2016). As the substrate of protein kinase CK2, KIF5C cloud interacts with CK2alpha to become a negative factor of adipogenesis (Schäfer et al., 2008;Chen et al., 2017). ENSSSCG00000030874, ENSSSCG00000027921, and ENSSSCG00000001447 genes were novel genes in the Ensembl database, while our functional analysis showed their roles in the immune system.
In conclusion, we identified seven SNPs significantly associated with sow milk production ability and propose 28 candidate genes. By integrated analysis of the biological functions, swine mammary gland transcriptome, and previous GWAS data, 10 genes (NAV2, ANO3, MUC15, DISP3, FBXO6, CLCN6, HLA-DRA, SLA-DQB1, HLA-DRB1, SLA-DQB1, and SLA-DQA1) were proposed to the key candidates. Our study provided a new insight for investigating the potential critical SNPs and genes involved in sow milk production, and the molecular information might be used to improve sow lactation performance.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

ETHICS STATEMENT
The animal study was reviewed and approved by the Animal experiments were approved by the Science Research Department of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China).

AUTHOR CONTRIBUTIONS
LXW and LS conceived and designed the study. LS collected the DNA and phenotype samples with the help of YL, QL, LZ, LGW, XL, HG, XH, FZ, and HY. LS analyzed the data and prepared the manuscript. All authors read and approved the final manuscript.