Rediscover and Refine QTLs for Pig Scrotal Hernia by Increasing a Specially Designed F3 Population and Using Whole-Genome Sequence Imputation Technology

Pig scrotal hernia is one of the most common congenital defects triggered by both genetic and environmental factors, leading to severe economic loss as well as poor animal welfare in the pig industry. Identification and implementation of genomic regions controlling scrotal hernia in breeding is of great appeal to reduce incidences of hernia in pig production. The aim of this study was to identify such regions or molecular markers affecting scrotal hernia in pigs. First of all, we summarized and analyzed the results of some international teams on scrotal hernia and designed a specially population which contains 246 male individuals. We then performed genome-wide association study (GWAS) in this specially designed population using two scenarios, i.e., the target panel data before and after imputation, which contain 42,365 SNPs and 18,756,672 SNPs, respectively. In addition, a series of methods including genetic differentiation analysis, linkage disequilibrium and linkage analysis (LDLA), and haplotype sharing analysis were appropriate to provide for further analysis to identify the potential gene underlying the QTL. The GWAS in this report detected a highly significant region affecting scrotal hernia within a 24.8Mb region (114.1–138.9Mb) on SSC8. And the result of genetic differentiation analysis also showed a strong genetic differentiation signal between 116.1 and 132.7Mb on SSC8. In addition, the QTL interval was refined to 2.99Mb by combining LDLA and genetic differentiation analysis. Finally, two susceptibility haplotypes were identified through haplotype sharing analysis, with one potential causal gene in it. Our study provided deeper insights into the genetic architecture of pig scrotal hernia and contributed to further fine-mapping and characterize haplotype and gene that influence scrotal hernia in pigs.


INTRODUCTION
Pig hernias are of the most common congenital defects which cause severe economic losses as well as poor animal welfare in the pig industry. The most common types of hernias in pig are scrotal and umbilical hernia. Scrotal hernia is the phenomenon of abdominal contents falling into scrotum from the unilateral or bilateral inguinal rupture, causing local expansion bulge (Grindflek et al., 2006;Du et al., 2009;Zhao et al., 2009). As a complex congenital defect, the reason of scrotal hernia formation is unclear; some abnormal phenomena and problems occurred at the stage of the development and obliteration of processus vaginalis in descent of testis, which have been considered to be the main reason for the development of scrotal hernia (Clarnette and Hutson, 1997;Clarnette et al., 1998). The genetic mechanical of scrotal hernia is also poorly clarified, only with the knowledge of cause by both multiple genetic and environmental factors. In the pig breeding industry, the occurrence of scrotal hernia is varied from 1.7 to 6.7% across from pig breeds and populations, and the heritability estimation varied from 0.2 to 0.6 in disparate studies (Mikami and Fredeen, 1979;Thaller et al., 1996). Environmental factors, as a potential factor in the occurrence of complex genetic diseases, have a great influence on the occurrence of scrotal hernia. Research reports showed that the incidence of scrotal hernia in Dutch Landrace and large white pig was 1.36 and 1.31%, respectively, while the corresponding incidence rate of Dutch Landrace and large white pig of Hypor was 0.54 and 0.22% (PK, 2006). In 2010, the European Breeding Corporation reported that the incidence of scrotal hernia in Dutch Landrace and large white pig was 0.383% (Walters, 2010). Obviously, the difference of environment will make the incidence of scrotal hernia different.
In breeding practice, it is not effective to decrease the incidence of pig scrotal hernia by conventional phenotypic selection. One of the methods of hernia resistance breeding is to isolate and identify susceptibility loci and major causative genes and then implement marker assisted selection. Currently, several research groups have identified the susceptible loci and potential positional candidate genes for scrotal hernia. Grindflek et al. reported several susceptibility QTLs for pig scrotal hernias on eight chromosomes (Grindflek et al., 2006). Ding et al. have revealed seven regions on SSC2,4,8,10,13,16,and 18 for scrotal hernia in a White Duroc and Erhualian F 2 intercross using nonparametric genome-wide linkage (NPL) analysis and transmission disequilibrium test (TDT) (Ding et al., 2009). Du et al. found that four regions surrounding ELF5, KIF18A, COL23A1 on chromosome 2, and NPTX1 on chromosome 12 may contain the genetic variants important for the development of the scrotal hernia development using a family-based analysis ). Sevillano et al. reported a susceptibility region on SSC13 between 34 and 37 Mb for scrotal hernia (Sevillano et al., 2015). However, these susceptibility areas are rarely further confirmed in other research groups; even using bigger population sizes, the genetic control of scrotal hernia has still not been clarified.
In the 10 years, we performed two statistical methods (TDT and NPL) in the F 2 population using 194 microsatellites and identified one chromosomal region distributed on SSC8 for the scrotal hernia. Generally speaking, nonparametric linkage analysis (NPL) evaluates allele sharing among affected individuals and comes to a result without particular model assumptions, and the TDT was proposed as a family-based association test for the presence of genetic linkage between a genetic marker and a trait; more computational details with this 2 statistical methods were showed by Ding et al. (2009). Using the same population, we perform GWAS study in 60K genotypes, the result manifested that none of SNPs achieved the genome-wide significance threshold . The feasible reasons for the "missing QTLs" in GWAS study probably are the low linkage disequilibrium between markers and low incidence rate in the subject population, or due to the intricacy genetic basis of this congenital defect. To overcoming these problems and exploring this congenital defect, we designed a specially F 3 population which was mated with fullsibs or half-sib of the affected individuals and imputed the chip SNPs to whole-genome sequences (Supplementary Figure S1) then implemented several classical genetic methods to rediscover and refine QTLs for pig scrotal hernia. Our aim in this study was to identify susceptibility loci of pig scrotal hernia and provided a novel insight for further analysis of the genetic basis of this congenital defect.

MATERIAL AND METHOD
All procedures including experimental animals established and tissue collection were performed in accordance with the guidelines approved by the Ministry of Agriculture of China. This study was approved by the ethics committee of Jiangxi Agricultural University.

ANIMALS OF THE TARGET POPULATION
A four-generation resource population was developed from the intercross of 2 White Duroc boars (PIC 1075) and 17 Chinese Erhualian sows between 2,000 to 2,006. In briefly, two White Duroc boars were crossed to 17 Erhualian sows, then 9 F 1 boars, and 59 F 1 sows were randomly selected to produce a total of 1,912 F 2 pigs in 6 batches avoiding full-sib mating . Last, 62 F 2 boars and 149 F 2 sows were selected to produce two types of F 3 population. The ordinary experiment population contains 661 F 3 offspring from an intercross of randomly chosen F 2 avoiding full-sib mating; the particular hernia population in this study contains 851 F 3 offspring, which were designed to mate the health full-sibs or half-sibs of affected individuals. Affected pigs were diagnosed and recorded carefully by veterinarians at three age stages: 46, 90, and 240 days. In summary, 23 affected pigs from F 2 population were confirmed, 5 affected pigs from ordinary F 3 population, and 23 affected pigs from F 3 hernia study population were diagnosed, respectively. A total of 1,020 individuals (19 F 0 , 68 F 1 , and 933 F 2 ) and 500 F 3 were genotyped. For this study, 246 male F 3 pigs were chosen for GWAS analysis, which contain 18 available DNA samples for affected individuals. Furthermore, 19 F 0 , 68 F 1 , and 516 F 2 male pigs, and 246 F 3 male pigs were used in haplotype sharing analysis.
Genomic DNA was isolated from ear tissue with a standard phenol/chloroform extraction method. All DNA samples were qualified and diluted to a final concentration of 50 ng/µl in 96-well plates. A total of 1,020 F 2 and 500 F 3 were genotyped with the Illumina PorcineSNP60 BeadChip and GeneSeek GGP Porcine 50K BeadChip on an iScan System (Illumina, USA) following the manufacturer's protocol, respectively (Ramos et al., 2009). Physical positions of SNPs on chromosomes referred to the swine reference genome sequence assembly (Sus_scrofa11.1) (http://asia.ensembl.org/Sus_scrofa/Info/Index). Quality control procedures were implemented by PLINK (version 1.07). Briefly, SNPs were removed if their positions on the genome build 11.1 were unspecific, call rate <90%, and minor allele frequency (MAF) <1%. Animals more than 10% missing genotypes were removed. To keep the alleles consistency with the sequencing data, we firstly aligned the primer sequences of each SNP to the reference porcine genome assembly Sus scrofa 11.1 by BLAST. Then, the genotypes of reversed SNP strands in target panel were flipped using PLINK (v1.9) software (Chang et al., 2015); SNPs without positions were excluded for further analysis.

HAPLOTYPE CONSTRUCTION OF REFERENCE PANEL
In this study, a wide collection of 109 whole-genome sequence individuals from 14 difference populations were used as a reference; each breed contained 2 to 22 individuals. More details on the origins, breeds, and sample size are shown in Table 1. We firstly trimmed the raw reads according to a quality score threshold greater than 15; then, BWA (Burrows-Wheeler Aligner) was used to align the raw reads which passed chastity filtering to the reference porcine genome assembly Sus scrofa11.1 (Li and Durbin, 2009). Variants were identified using the GATK (Genome Analysis Toolkit) (McKenna et al., 2010); PCR duplications were firstly marked by Picard MarkDuplicates (http://broadinstitute. github.io/picard/), and GATK IndelRealigner option was carried out for local realignments. Then, variants were filtered with GATK VariantFiltration option. VCFtools was used to remove the structural variants. Subsequently, the haplotypes of 109 individuals with cleaned SNP data were constructed by Beagle (v4.1) (Browning and Browning, 2007). Specifically, the number of markers to include in each sliding window was set to 100,000, and the overlap between windows was set to 3,000 markers. Then, the number of phasing iterations was set to 50. Finally, the other options involving in the imputation follow the default setting.

IMPUTATION
Whole-genome sequence imputation between target and reference panel was conducted by Beagle (v4.1) using the default parameter settings (Browning and Browning, 2016). Specifically, the size of imputed region was set to 50,000 markers per window, and the overlap between windows was set to 3,000 SNPs. This software first constructed local haplotypes using the hidden Markov chain Monte Carlo (MCMC) algorithm and then resampled new estimated haplotypes for each individual based on a hidden Markov model (HMM).
Imputation accuracy should be further investigated in wholegenome sequence data because of the low density and common variants in 50k. Browning et al. and Williams et al. have fully exhibited the number of individuals present in a population is a crucial factor in determining how well the phase can be estimated for haplotype construction (Browning and Browning, 2011;Williams et al., 2012). Therefore, 109 whole-genome sequence pigs including 19 F 0 who were the progenitor of the 500 F 3 populations were also regarded as reference panel in order to obtain more accurate phase information. Then, the genotypic concordance rate and the squared correlation (R2) between best-guess imputed and the original variants as imputation accuracy. The genotypic concordance rate used a cross-validation strategy described in previous studies Pausch et al., 2017). More specifically, two thousand loci in the target sample were deleted randomly then imputed in the same strategy. The number of 2,000 alleles imputed correctly divided by total 2,000 loci (the allelic correct rate) was taken to calculate the accuracy of imputation. Finally, in order to balance the imputation accuracy and missing proportion in the next analysis process, we excluded the variants with call rate <90% and MAF <0.03.

GWAS
GEMMA was utilized to perform the association analyses underlining the standard linear mixed model (Zhou and Stephens, 2012). Sex and batch were included as fixed effects. Heritability was estimated by using −lmm procedure implemented in GEMMA using genomic relationship matrix. Population stratification and were adjust by including genomic relationship matrix. Briefly, this model is denoted as: where y is a n element vector of phenotypic values (or case/ control labels), α is a c-vector of fixed effects, β is the effect size of SNPs, W is a design matrix of covariates, x is a vector of genotypes at each locus, and u is the vector of random effects following the multivariate normal distribution MVN n (0, λτ −1 K), where τ −1 is the variance of the residual errors, and λ is the ratio between τ −1 and the variance of the residual errors; K is a known kinship matrix, ∈ is an vector of errors following the multivariate distribution MVN n (0, λτ −1 I n ), and I n is an n × n identity matrix. Normally, significance threshold of multiple test in chip array-based GWAS was adjusted by naïve Bonferroni corrections, which is 0.05 divided number of examined SNPs. However, this approach would lead to over correction and decreasing the detection power in GWAS as these tests are nonindependent for the linkage disequilibrium between markers. We herein used 5E−08 as a genome-wide suggestive significance threshold following Pe' er et al. and Johnson et al. (Pe' er et al., 2008;Johnson et al., 2010). The population stratification is one of the factors that affects the validity of genome-wide association study (Pearson and Manolio, 2008). To check if stratification exists in our result, quantile-quantile plots (Q-Q plots) were implemented to evaluate population stratification effects. The Q-Q plots were constructed with R software. Measures of linkage disequilibrium (r and r 2 ) between SNPs were estimated by plink 1.07 (Clarnette and Hutson, 1997), the default settings for minimum linkage between SNPs at threshold r 2 = 0.8.

GENETIC DIFFERENTIATION ANALYSIS
To elucidate whether there is genetic differentiation exist in scrotal hernia pigs and health pigs, we divided the affected pigs and the unaffected pigs into two groups, as the method did by Zhang et al. (2019) then assessed allele frequency differentiation using the unbiased genetic differentiation estimated of the fixation index (Fst). Akey et al. have fully described estimation of unbiased Fst fixation index in his paper using SNP dataset (Akey et al., 2002). Briefly, Fst was estimated as follows: where MSG represents the observed mean square errors for loci within populations, MSP denotes the observed mean square errors for loci between populations, and n c is the average sample size across samples, which incorporates and corrects for the variance in the sample size over population In the above formulae, n i and p Ai denote the sample size and the frequency of SNP allele A in the ith population, respectively, and p A is a weighted average of p A across populations. The negative Fst didn't have any biological interpretation and were set to 0 to fit the definition of Fst ranging from between 0 and 1 (Wright, 1951). The top 1% of loci according to genetic differentiation values was served as candidate regions to host resistance or susceptibility to pig scrotal hernia (Zhang et al., 2019).

LINKAGE DISEQUILIBRIUM AND LINKAGE ANALYSIS (LDLA)
The haplotypes of F3 on SSC8 were reconstructed using a hidden Markov model by beagle (Zhang et al., 2012) and then the graphical model for the haplotype clusters with beagle was directly generated, which is a directed acyclic graph (DAG). The parameters for both processes are set to scale equals 2 and shift equals 0.1.
Haplotypes within a cluster are likely to descend from the same ancestral haplotype and to carry the same DSV (DNA sequence variants) and combination of alleles, which is actually the principle used in linkage analysis. The linkage disequilibrium or association mapping information is generated by ancestral recombinations and detected by population level associations between individuals. Then, the clustered haplotypes were converted into diallelic markers by pseudomarker program, which can be imported into a program like R for statistical analysis. Thus, haplotype data contains both linkage and linkage disequilibrium information and can be imported into a mixed model framework: where Y is the vector of phenotypes, and b is fixed effects including sex and batch. The haplotypes could be treated as random here, as there are likely to be many of them, and some haplotypes will occur only a small number of times. Therefore, the random additive genetic effect following the distribution u N G ũ ( , ) 0 2 σ , in which G is the individual-individual similarity matrix, and σ u 2 is the polygenetic additive variance, and X and Z are incidence matrices for b and u, respectively. The residual random effect "e" following the distribution e~( , ) N I e 0 2 σ . The LDLA analysis was carried out using a homemade R scripts (Supplementary Data Sheet 2). The most likely position of the QTL was obtained by the 2-LOD drop method (Karim et al., 2011).

HAPLOTYPE SHARING ANALYSIS
The haplotypes in the target QTL region were constructed by fastPHASE. Firstly, we tried to find the sharing susceptibility haplotype by thoroughly scanning the haplotypes of affected individuals in F 3 population. Then, we tried to identify whether the same sharing susceptibility haplotype existed in F 2 affected individuals and tried to trace it to the F 1 and F 0 generations. It should be noted that we take the intersection of SNPs of F 3 and F 2 due to the different density of 50 and 60k chip.

CONDITIONAL ASSOCIATION TEST
To elucidate whether there are additional QTLs for scrotal hernia in the identified QTL region, we extracted genotypes of the top SNP and included as a covariate to the univariate linear mixed model, which was performed in the single-marker GWAS as we described above then performed a conditional test to retest the association between SNPs and phenotypes. If additional signal was detected, then there were multiple QTLs that cooperated to control scrotal hernia. Otherwise, there was only one QTL that affected scrotal hernia.

BOOTSTRAP TEST
The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement (Efron and Tibshirani, 1993). It can be used to estimate summary statistics such as the mean, standard deviation, confidence interval, or correlation coefficient, which is done by repeatedly taking small samples, calculating the statistic, and taking the average of the calculated statistics. We herein carried out bootstrap test to verify the reliability of GWAS in this study. First, we randomly resampled for 1,000 times with replacement, in which some affected individuals can be sampled for multiple times, while some may be sampled for 0 times, the total number of affected individuals that may either increase or decrease, and the same resample results were acquired in unaffected individuals. Then, we conducted GWAS for 1,000 times to see if there were still significant signals in the susceptibility region which was identified in our study.

Phenotype Statistics and SNP Characteristics After Quality Control
Incidences of scrotal hernia were estimated to be 0.7 and 2.7% in the ordinary F 3 population and in the specially designed F 3 population, respectively. It is obvious that the incidence of scrotal hernia in the specially designed F 3 population was significantly higher than in the ordinary F 3 population. Heritability for scrotal hernia was estimated at 0.39 using the standard linear mixed model, which implies that there is a genetic contribution to scrotal hernia. After quality control, a total of 42,365 SNPs and 246 pigs had retained for further analyses. Imputation was produced using Beagle software. The summarization of imputation results is presented in Table 2. After imputation, a total of 46,483,626 SNPs for 246 individuals were obtained, and 18,756,672 SNPs were retained after filtering with MAF > 0.03. The average genotypic concordance rate was 84.8%, and the average correlation between best-guess and true variants reached with an average of 71% after we delete sites where R 2 is equal to 0 and MAF is less than 0.03 (Supplementary Figure S2).

SUMMARY OF GWAS
We conducted a GWAS on the F 3 population in two scenarios, i.e., the target data before and after imputations. In the scenario with experimental 50k chips data, we identified a total of 18 SNPs that surpassed the genome-wide significance level ( Figure 1A). The most significantly associated SNP rs320409365 (P-value = 2.64 × e −14 ) locates at 124.1Mb within a 10.6Mb region (116.1-126.7Mb) on SSC8 (Table 3). In the scenario with imputed sequence data, 3,236 significant SNPs were located on SSC1,3,6,7,8,9,10,12,13,and 16 (Figure 1B), and the most significantly SNP rs319603861 (P-value = 1.52 × e −18 ) locates at 122.2Mb within a 21Mb region (115.5-136.5Mb) on SSC8 ( Table 4). In addition, to validate the possibility of spurious SNPs caused by population stratification, the Q-Q plots for these GWAS were explored (Supplementary Figure S3). The average inflation factors (λ) of the GWAS were 1.17 and 1.2 in the two scenarios, respectively. Indicating that population structures were properly corrected.

GENETIC DIFFERENTIATION SCORES
Fst were estimated to determine the extent of population differentiation between the affected and unaffected pigs. We identified a total of 26 SNPs beyond the empirical threshold on SSC8 ( Figure 2B); the strongest genetic differentiation loci rs320409365 (Fst = 0.535) locates at 124.1Mb within a 16.6Mb region (116.1-132.7Mb), indicating the affected pigs and the unaffected pigs had a large genetic differentiation in this interval. All the SNPs beyond the empirical threshold in this interval are shown in Table 5.

FINE MAPPING ON SSC8 USING LDLA AND GENETIC DIFFERENTIATION ANALYSES IN THE F 3 POPULATION
To further narrow down the confidence interval of SNPs SSC8 for scrotal hernia, we perform linkage and linkage disequilibrium (LDLA) for scrotal hernia on SSC8. The LDLA results showed the strongest association SNP was rs330263452 (P-value = 1.58 × 10 −17 ); the most likely confidence interval of the QTL was approximately 3Mb (121-123.99Mb), based on the LOD drop off 2 (Figure 2A). We herein concluded a common QTL region located on SSC8 between 121.02 and 123.99Mb mapped by LDLA and genetic differentiation analysis.

HAPLOTYPE SHARING ANALYSIS WITHIN THE CONFIDENCE INTERVAL
The result of haplotype sharing analysis on F 3 population was showed on Figure 3B. To put the result in detail, 15 of 18th affected pigs shared two types of haplotype in this 2.97Mb region flanked by markers rs318390967 and rs81404172. Those two shared haplotypes were associated with pig scrotal hernia and presumably Q 1 -bearing and Q 2 -bearing haplotypes, respectively. Further investigation revealed 27 of 228 unaffected pigs also carried the Q 1 or Q 2 haplotype. To test the risk ration and significance of individual carried Q haplotype, we summarized the number of affected pigs and unaffected pigs who carried and uncarried Q haplotype and conducted chi-square test with them (chi-square test P-value = 8.46 × E −15 ). This result is indicative of that the hypothesized Q haplotype was involved in the occurrence of scrotal hernia in pigs. Next, we tried to identified whether there is the same sharing susceptibility haplotype existed in F 2 affected individuals and trace this susceptibility haplotype back to the F 1 and F 0 generations. The result showed that 13 of the 19 affected pigs in the F 2 population also carried Q 1 or Q 2 haplotype flanked by markers rs81275702 and rs81404172 (Figure 4), and another carried other types of haplotypes. According to the pedigree ( Table 6), we also found that the parents of those 13 affected pigs also carried Q 1 or Q 2 haplotype in the same region, while the other 5 parents with other types of haplotypes individuals did not. Most of all, we found Q 1 and Q 2 haplotypes were come from of one White Duroc boars (F 0 -73) and three Chinese Erhualian sows (F 0 -74, F 0 -94, F 0 -124) when we traced those two haplotypes to the F 0 generation, respectively. Therefore, it is concluded that two susceptibility haplotypes underlying the SSC8 were identified for pig scrotal hernia, and there should be some important pathogenic mutations. In addition, it was worth mentioning that the significantly associated SNP rs81404013 (P-value = 8.72×E −12 ), rs318390967 (P-value = 8.72×E −12 ), and rs333147082 (P-value = 2.64×E −14 ) that contained in this confidence interval have strong linkage disequilibrium extents (r 2 > 0.9) to each other ( Figure 3A). However, the most significantly associated SNP rs320409365 (P-value = 2.62×E −14 ) has a low linkage disequilibrium extents (r 2 < 0.5) with those three loci. We take a region flanked by markers rs341392224 and rs326688253, which contain rs320409365, as well as it's left and right two loci. Then, we count the types of haplotype in this interval and take a chi-square test with them (Supplementary Table 1); the result showed that haplotype CACGT (P-value = 1.02×E −12 ) was significantly associated with scrotal hernia.

CANDIDATE GENE EIF4E FOR GENOME-WIDE SIGNIFICANT QTL
The 2.95Mb region on SSC 8 in pig (Ensembl 2018) encompasses eight annotated genes (ADH6, ADH4, ADH5, METAP1, EIF4E, TSPAN5, RAP1GDS1, STPG2), which indicated that few genes are the most likely candidate genes that caused scrotal hernia in pigs (Zerbino et al., 2018). Of the eight genes, EIF4E stood out as a potential candidate based on its biochemical and physiological functions. EIF4E is a protein-coding gene, which regulates the expression of the Eukaryotic translation initiation factor 4E protein, and translation initiation factor 4E is regulating the expression of MID1 gene (Pelletier et al., 1991;Jones et al., 1997). Winter et al. demonstrated that loss-of-function mutations in the MID1 gene may cause the malformations of the ventral midline, which always lead to a series of urogenital abnormalities, such as cryptorchidism, ambiguous genitalia, hypoplastic scrotum, and umbilical and inguinal hernias (Winter et al., 2016). In addition, both top SNPs rs333147082 (P-value = 2.64×e −14 ) and rs81404013 (P-value = 8.72×e −12 ) located in the intron of EIF4E gene when we condition the strongest significantly associated SNP rs333147082; no additional association signals appeared in this loci (QTL) was detected (Supplementary Figure S4), which showed the additional evidence for the causality of EIF4E incorporating functional and conditional association studies. These results were more evidence that the EIF4E is the susceptibility gene for pig scrotal hernias.

DISCUSSION
In the current study, we obtained 18,756,672 variants with 84.8% genotypic concordance rate. In the study on imputation, few researches reported the imputation accuracy from 60K to whole-genome sequence in pig, compared to most studies focused on imputation from low-density genotypes to 60k variants with correlations ranging from 0.938 to 0.992 for imputation from 3 to 60K (Cleveland and Hickey, 2013). Yan et al. showed an average genotypic concordance of 89% with imputing 60K to whole-genome sequence variants in a   (Yan et al., 2018), and Zhang et at. reported the genotypic concordance was 85.6% from 650K to whole-genome sequence variants using a stepwise imputation strategy in 1,363 Duroc pigs ; the genotypic concordance rate (84.8%) in our study is almost to their level. Moreover, we adopted R 2 to estimate imputation accuracy on account of genotypic concordance rate that is highly sensitive to MAF and is not appropriate for comparing genotypes with different MAF (Yan et al., 2018).
In the present study, R 2 decreased from 58 to 8% when MAF decreased from 0.1 to 0. The same trend was found in other studies (Daetwyler et al., 2014;Yan et al., 2018). And the average correlation between best-guess and true variants reached with an average of 71% after we delete sites where R 2 is equal to 0 and MAF is less than 0.03. Besides, Yan et al. showed that the average correlation is lower than the genotypic concordance rate, which was consistent with our result in this study (Yan et al., 2017). In addition, there are many other factors that affect the accuracy of imputation, such as the relationships between target panel and reference  and LD and reference size . Here we sequenced 19 ancestors of F3 to ensure our imputation reliability. Overall, imputation accuracy can be affected by different aspects, and high accuracy of imputation will lead to a reliable GWAS.
GWAS has become an exceedingly effective and widely used approach in identification of genetic variants associated with common diseases or complex traits since the first application of GWAS research was performed successfully in 2005 by Klein et al. (2005). Previously, by performing haplotype-based GWAS in F 2 population for scrotal hernia using Porcine SNP60 BeadChip, 108 chromosome-wise significance SNPs were identified to be associated with scrotal hernia; however, there was no marker surpassed the genome-wide significance level. The feasible reasons for the low detection power in this study was probably the low incidence and penetrance rate in F 2 population. But the most possible reason is the intricacy molecular genetic mechanism of scrotal hernia. So far, many international teams have identified the susceptibility loci of scrotal hernia on almost all chromosomes. Complex interactions between environmental factors and susceptibility alleles of multiple genes are the most normal process resulting in such a complex genetic background diseases. As a complex genetic defect, the polygene model may be the main pathogenesis, under polygene model that lots of susceptibility genes cause a change for disease. Therefore, whether a certain mutation is not directly related to scrotal hernia, but does have a role in the occurrence of it, this is why single-marker GWAS can't detect any significant signal in the F 2 population. Therefore, we generate a particular hernia population which was mated with full-sibs or half-sib of the affected individuals. The incidences of scrotal hernia will increase significantly in this population. Next, we will systematically describe the feasibility of our idea.
In the current study, we first designed a specially F 3 population to increase the incidence and penetrance rate by crossing fullsibs or half-sib of the affected individuals in the F 2 population. Statistics manifested that prevalence of scrotal hernia in the specially designed F 3 population was 3.6 times and 2.4 times higher than the ordinary F 3 population and F 2 population, respectively, indicating the F 3 specially designed population is completely successful in increasing the incidence rate of scrotal hernia. Most importantly, in the F 2 population, the frequency of a mutation associated with scrotal hernia will be greatly increased in F 3 specially designed population, as the health full-sibs or halfsib of the affected individuals in the F 2 population also have the mutant sites, which will pass on to the F 3 population.
As we predicted, 13 SNPs were located on SSC8 between 116.1 and 126.7Mb surpassed the genome-wide significance level after we conducted a GWAS on the specially designed F 3 population with experimental 50-k chip data, and this QTL must have come from F 2 , which overlaps with a region previously identified by Sevillano et al. (2015). The basic principle of singlemarker GWAS was to test association between phenotypes and genotypes. Normally, this association was indirect correlation as the causative mutation was not included in the study locus. Potentially, significant signals could be missed in a GWAS analysis if there were low LDs among paired markers. To improve the LD between markers, we performed imputation analysis by increasing the marker density in the study population using 109 sequenced data as reference panel. Consequently, we obtained 18,756,672 variants with relatively high imputation accuracy (average CR = 84.8%). After performing the wholegenome association study with sequence data, 3,252 significant SNPs reached the significant level. Three regions located on SSC3, SSC8, and SSC10 were similar to corresponding interval previously identified by Sevillano et al. (2015), especially the region on SSC8 between 115.6 and 136.5Mb overlaps a region they previously identified. To our knowledge, it is the first time that the other eight QTL regions identified on SSC1,6,7,9,13,and 16 are found to be associated with scrotal hernia, although some studies have reported that different regions on these chromosomes harbor QTL for scrotal hernia.
According to our original intention, we identified 13 SNP loci significantly associated with scrotal hernia on chromosome 8 through GWAS analysis with the specially designed F 3 population. In the subsequent analysis, we divided the affected individuals and unaffected individuals into two independent groups and calculated the genetic differentiation index to verify that there is genetic differentiation on SSC8. The result showed that a strong genetic differentiation signal located on rs320409365 (Fst = 0.535) within a 16.6Mb region (116.1-132.7Mb) on SSC 8 was detected. This result indicated that the affected pigs and the unaffected pigs had a greater genetic differentiation in this confidence interval. Moreover, there is a high coincidence of the top SNPs detected through genetic differentiation analyses and GWAS. Additionally, in consideration of single-marker GWAS, it was hard to properly estimate the confidence interval of the detected QTL, as LD varied severely among nearby SNPs while haplotypes have stable LD than SNPs. Thus, we conducted haplotype-based LDLA analysis, by simultaneously taking advantage of recent and ancestral recombination events to increase the efficiency and detect confidence interval. The LDLA results showed that the SNP with the strongest association at the locus was rs330263452 (P-value = 1.58 × 10 −17 ), and the most likely confidence intervals around the 121-123.99Mb region on SSC8. Furthermore, we found out that the confidence intervals mapped by LDLA contained within the region mapped by genetic differentiation analysis. We narrow the confidence interval to 2.99Mb by picking up the intersection of those two intervals for further analysis.
Lastly, we identified two susceptibility haplotypes underlying the SSC8 associated with scrotal hernia after performed a haplotype sharing analysis, and those two haplotypes were from one White Duroc boar (F 0 -73) and three Chinese Erhualian sows (F 0 -74, F 0 -94, F 0 -124), respectively. It is incomprehensible that the White Duroc boar (F 0 -73) carried the susceptibility haplotype, but it was unaffected. Actually, whether a certain mutation is not directly related to scrotal hernia as we explained earlier.
Similarly, 163 of 497 unaffected pigs in the F 2 population also carried the susceptibility haplotypes, echoing the result that there was no significant signal when we performed GWAS in the F 2 population. When we merge the F 2 and F 3 populations and then conducted chi-square test with them (chi-square test P-value = 8.32×E −11 ), this result is also indicative of that the hypothesized Q haplotype was involved in the occurrence of scrotal hernia in pigs. In addition to discovering two susceptibility haplotypes, we further found that there are nine annotated genes in this 2.95Mb interval in total, and the EIF4E was selected as potential candidate gene based on its biochemical and physiological functions.
Although there are some crucial discoveries revealed by these studies, there are a slice of limitations to our study, such as the relatively small number of samples in the F3 population. Therefore, we herein carried out bootstrap test to verify the reliability of GWAS in this study. The result showed that there are 957 of the 1,000 GWAS Golden diamonds and red diamonds represent the Q 1 and Q 2 haplotypes with affected pigs, respectively. The last six lines indicate that three affected pigs who carried other types of haplotypes. September 2019 | Volume 10 | Article 890 Frontiers in Genetics | www.frontiersin.org FIGURE 4 | The haplotype sharing analysis in the F 2 population. The figure showed that Q 1 and Q 2 haplotypes contained in F 2 affected individuals and traced back to the F 1 and F 0 generations. The last 12 lines indicate six affected pigs that carried other types of haplotypes. that were detected significant signals in the 116-126Mb interval on chromosome 8, which indicated that the fluctuation in the number of affected and unaffected individuals has no effect on GWAS (FDR < 0.05). Therefore, the significant signals obtained in our GWA study were not accidental but were caused by differences in the genomes of affected and unaffected individuals, which were reliable.

CONCLUSION
In summary, in the first place, we discovered a major quantitative trait loci (QTL) for pig scrotal hernia on chromosome 8 in an F 3 specially designed population using GWAS. There is one more point: two susceptibility haplotypes (Q 1 and Q 2 ) flanked by markers rs81275702 and rs81404172 and one potential causal gene underlying the SSC8 were identified through a series of methods including genetic differentiation analysis, LDLA, and haplotype sharing analysis. Last but not the least, we explain why many international research teams do not have a high repeatability of the results of scrotal hernia research, and some research studies haven't even found any associated locus with scrotal hernia. Further studies will be devoted to confirming the detected haplotype and gene in outbred populations.

DATA AVAILABILITY
The genotypic data of 246 F3 individuals as well as the phenotype of scrotal hernia for this study can be found in the figshare Digital Repository (https://figshare.com/s/d661962aa6c0740caeab), and the genotypic data of 516 F2 individuals analyzed for this study can be found in the Dryad Digital Repository (https://doi. org/10.5061/dryad.7kn7r) (Ma et al., 2013). The raw reads of the whole-genome sequence can be found from the NCBI sequence read archive (SRA) under the accession codes SRA065461 and SRP159212 (Ai et al., 2015;Yan et al., 2018).

ETHICS STATEMENT
This study was approved by the ethics committee of Jiangxi Agricultural University. All procedures including experimental animals established and tissue collection were performed in accordance with the guidelines approved by the Ministry of Agriculture of China.

AUTHOR CONTRIBUTIONS
LH and ZZ conceived and designed the experiments. WX, GY, and TH analyzed the data. DC and SX contributed materials and analysis tools. WX, ZZ, and LH wrote the manuscript. All authors read and approved the final manuscript.