Fine Mapping of a Major Backfat QTL Reveals a Causal Regulatory Variant Affecting the CCND2 Gene

Backfat is an important trait in pork production, and it has been included in the breeding objectives of genetic companies for decades. Although adipose tissue is a good energy storage, excessive fat results in reduced efficiency and economical losses. A large QTL for backfat thickness on chromosome 5 is still segregating in different commercial pig breeds. We fine mapped this QTL region using a genome-wide association analysis (GWAS) with 133,358 genotyped animals from five commercial populations (Landrace, Pietrain, Large White, Synthetic, and Duroc) imputed to the porcine 660K SNP chip. The lead SNP was located at 5:66103958 (G/A) within the third intron of the CCND2 gene, with the G allele associated with more backfat, while the A allele is associated with less backfat. We further phased the QTL region to discover a core haplotype of five SNPs associated with low backfat across three breeds. Linkage disequilibrium analysis using whole-genome sequence data revealed three candidate causal variants within intronic regions and downstream of the CCND2 gene, including the lead SNP. We evaluated the association of the lead SNP with the expression of the genes in the QTL region (including CCND2) in a large cohort of 100 crossbred samples, sequenced in four different tissues (lung, spleen, liver, muscle). Results show that the A allele increases the expression of CCND2 in an additive way in three out of four tissues. Our findings indicate that the causal variant for this QTL region is a regulatory variant within the third intron of the CCND2 gene affecting the expression of CCND2.


INTRODUCTION
Backfat (BF) is an important trait in pork production included in the breeding objectives of genetic companies for decades (Merks, 2000). Pig commercial lines have been selected for efficient meat production selecting for growth and leanness traits and thereby reducing backfat. However, a strong selection for decreased backfat also decreases other fat-related traits such as intra-muscular fat, which is highly valuable for meat quality (Lonergan et al., 2001;Fontanesi et al., 2012;Chen et al., 2014).
Adipose tissue has an important role as the storage of energy (Wang et al., 2016), however, excessive fat may cause economic implications in pig breeding affecting growth, feed efficiency, and meat quality (Lonergan et al., 2001). Furthermore, fat and fatty acids contribute significantly to the quality and nutritional content of the meat (Wood et al., 2008). The amounts of saturated fatty acids (SFA), monounsaturated fatty acids (MUFA) and polyunsaturated fatty acids (PUFA) are important for the meat processing industry.
Hence, to better understand quantitative traits such as backfat, further insight in the genetic basis of these traits is required. The development of commercial SNP chips has enabled genome-wide association studies (GWAS) to efficiently map regions throughout the genome affecting fat-related traits (Fontanesi et al., 2012;Fowler et al., 2013;Okumura et al., 2013;Do et al., 2014;Gozalo-Marcilla et al., 2021). While GWAS identify significant marker associations, the current commercial SNP chip density often leads to clusters of markers (due to extended linkage disequilibrium) covering a region that is still too large to allow accurate identification of the responsible gene(s) or variant(s). Hence, the need for further fine mapping is required to find causative relations between gene(s) or variants that affect economically-important traits such as backfat. The detection and integration of expression quantitative trait loci (eQTLs) can facilitate further fine mapping of QTL regions to study the genetic architecture of complex traits. The eQTL analysis allows to identify variation associated with changes in gene expression that underly the phenotypic differences observed in complex traits.
One region on chromosome 5 (66 Mb) has shown strong association with backfat in a wide range of populations. Gozalo-Marcilla et al. 2021 proposed the FGF23 gene as likely causal given the association with phosphate homeostasis. In this study, we investigate the same QTL region on chromosome 5 with strong association with backfat across four commercial pig populations. Further, we aimed to fine map this QTL region to identify causative mutation(s) related to backfat and to better understand the biology of this trait. We concluded that the causative variant of this QTL region is a regulatory SNP in the third intron of the CCND2 gene.

Phenotypic Data
The data consisted of five commercial pig populations; Synthetic (Large-White based), Pietrain, Landrace, Large-White and Duroc (Table 1), obtained from herds owned by Topigs Norsvin. The phenotypic data was composed by backfat records, measured at the end of the test period (average live weight of 120 kg) in each evaluated population. The data were classified in two datasets: PHENOTYPED, which consisted of all phenotyped animals and their contemporary's animals; and GENOTYPED, which is a subset of phenotyped animals with genotypes and phenotypes ( Table 1).
Using PHENOTYPED dataset, phenotypes were pre-corrected for all non-genetic effects to use as response variable in further analysis. For that, the non-genetic effects were estimated within population using the following animal linear model in ASReml v3.0 (Gilmour et al., 2009): where BF ijkl is the phenotypic observations (BF) of the k animal; μ is the overall mean; sex i is the fixed effect of sex i; hym j is the fixed effect of the herd-year-month j of birth; a k is the random additive genetic effect of animal k; litter l is the random effect of litter l and e ijkl is the random residual effect. It was assumed that a~N (0, A σ 2 a ), l~N (0, I σ 2 l ) and ẽ N (0, I σ 2 e ), where σ 2 a , σ 2 l and σ 2 e are the additive genetic variance, litter variance and residual variance, respectively; A is a pedigree-based numerator relationship matrix and I is the identity matrix.

Genotyping and Quality Control
In total, 134,887 animals were genotyped across the different populations and SNP chip densities ( Table 2). The GENOTYPED dataset includes animals genotyped using different Illumina's medium density SNP chips (50, 60 and 80K), as well as animals genotyped with the high-density Axiom ™ Porcine Genotyping Array 660K (Thermo Fisher Scientific, Waltham, Massachusetts, United States) ( Table 2).
Genotype quality control analysis was performed within population and SNP chip to exclude SNPs with genotype call  rate <0.95, minor allele frequency <0.01, strong deviation from Hardy-Weinberg equilibrium p-value < 1 × 10 -12 , SNPs located on sex chromosomes and unmapped SNPs. The positions of the SNPs are based on the Sscrofa11.1 assembly of the reference genome (Warr et al., 2020a). All genotyped animals had a frequency of missing genotypes <0.05 and were therefore all kept for further analyses.
After the quality control, the medium density genotypes were imputed towards the high density genotypes within population using Fimpute v2.2 (Sargolzaei et al., 2014). After imputation, a total of 31,183; 16,189; 38,298; 36,414 and 11,274 animals and 488,585, 504,525, 443,619, 518,659 and 417,532 SNPs were available for the Synthetic, Pietrain, Landrace, Large-White and Duroc populations, respectively, and used for the GWAS.

Genome-Wide Association Analysis
A single-SNP GWAS was performed with the imputed GENOTYPED dataset for each population using the following linear animal model in GCTA software (Yang et al., 2011(Yang et al., , 2014: where ypk is the pre-corrected phenotype of the k animal (precorrected for all non-genetic effects); μ is the average of the precorrected phenotype; X is the genotype, coded as 0, 1, or 2 copies of one of the alleles of the k animal for the evaluated SNP;β is the unknown allele substitution effect of the evaluated SNP; u k is the residual polygenic effect, assuming u~N (0, G σ 2 u ), which accounted for the (co)variances between animals due to relationships by formation of an G matrix (genomic numerator relationship matrix build using the imputed genotypes), σ 2 u is the additive genetic variance; and e k is the random residual effect which was assumed to be distributed as N (0, I σ 2 e ). The proportion of phenotypic variance explained by a SNP was defined as σ 2 SNP σ 2 P , where σ 2 P is total phenotypic variance (sum of the additive and residual variances) which was estimated based on model (2) without a SNP effect, and σ 2 SNP 2pq(β) 2 , where p and q are the allele frequencies andβ the estimated allele substitution effect of the evaluated SNP. Variance components were also estimated using model (2) without a  SNP effect and the heritability was defined as the proportion of σ 2 P explained by the genetic variance. The association between a SNP and the phenotype was declared significant when-log 10 (p-value) > 8.

Fine-Mapping
To search for possible causal mutations at the chromosome 5 backfat locus, fine mapping analyses using high-throughput sequencing (WGS, ChIP-Seq, ATAC-Seq and RNA-Seq data) was performed using following steps: (1) building and calculation of haplotype frequencies; (2) sequence data processing and evaluation of variants within the haplotype region; (3) evaluation of protein interactions and gene expression.

Haplotype Analysis
From the GWAS results, the most significant SNP (lead SNP) was identified. Subsequently we performed a haplotype analysis upand downstream of the lead SNP. For that, firstly the Beagle v5.1 software (Browning et al., 2018) was used to phase the imputed genotypes taking 0.5 Mb downstream and 0.5 Mb upstream of the lead SNP (SSC5:66,103,958bp) in the QTL region SSC5:66 ± 0.5 Mb. The lead SNP and the 40 markers with closest vicinity of the lead SNP (20 markers on each side) were selected and used in the HaploPi pipeline (Oliveira, 2019) to build and calculate the haplotype frequencies. After performing the haplotype analyses, the core haplotype (SSC5: 66094630-66140348) was used to perform a haplotype-based association analysis using the number of copies of the core haplotype. For that, the same mixed-model described for the single-SNP GWAS (model 2) was used, in which the SNP genotypes were replaced by the genotype of the core haplotype (0, 1 or 2).

RNA-Sequencing in Crossbred Animals in Four Tissues for eQTL Analysis
Four different tissues: liver, spleen, lung and muscle were subjected to RNA extraction from 100 animals. The animals were F2 crosses resulting from a F1 sow (Landrace*LargeWhite) and a Synthetic boar line. The collected tissue was stored with RNAlater (Thermo Fisher Scientific) at -80°C until further use. RNA was extracted from the 400 samples using the QIAshredder  homogenizer kit (Qiagen) and RNA extraction was performed with the Rneasy kit (Qiagen) following manufacture's guidelines. Once all the samples passed the required quality control parameters, they were sequenced at 150 bp paired-end in an Illumina 6000 sequencing platform. DNA from 87 of these animals was also extracted from the spleen tissue and was subjected to high density genotyping with the Axiom ™ Porcine Genotyping Array (Thermo Fisher Scientific) that queries 660K variant markers.
For the bioinformatic analysis, RNA-Seq reads were trimmed using Trim Galore v0.3.7 (https://www.bioinformatics.babraham. ac.uk/projects/trim_galore/). We mapped the RNA-seq data to the Sscrofa11.1 reference genome and Ensembl version 104 using STAR v2.7.8 (Dobin et al., 2013). Alignments with an alignment MAPQ score <30 were filtered using SAMtools v0.1.19 (Li and Durbin, 2009). Gene counts were determined using htseq-count v0.11.1, and then TMM-normalized as counts per million (CPM) with EdgeR (Robinson et al., 2009). Plotting of expression levels per genotype class was performed using the seaborn python package (Waskom, 2021). For the eQTL analysis, genotypes were firstly filtered as in (Derks et al., 2021) with plink v.1.9 (Purcell et al., 2007). Only genes ±1 Mbp from the target lead SNP of interest were used for the analysis. The single-SNP association analysis was carried for each of the tissues with the GCTA v1.25.3 software (Yang et al., 2011) with the following model: where (Y i ) is the CPM gene abundance modeled as a function of the population mean (µ), fixed effect of each SNP (SNP i ), and a random residual effect (e i ). The genetic variance explained by a SNP (σ 2 SNP = 2pqα 2 ) was estimated based on the allele frequencies (p and q) and the estimated allele substitution effect (α). The proportion of phenotypic variance explained by the SNP was defined as σ 2 SNP /σ 2 P , where σ 2 P is total phenotypic variance (sum of the additive and residual variances) which was estimated based on model (3) without a SNP effect. Significant association between a SNP and expression was detected using a p-value < 0.05.

Genome-Wide Association Study
In this study, the use of large-scale genotype data allowed us to identify a A/G SNP (rs80985094: 5:66,103,958) on SSC5 with a strong association with backfat across populations. This SNP showed significant association with backfat in four (Synthetic, Pietrain, Landrace and Large White) out of the five evaluated pig populations (Figure 1). The association between rs80985094 and backfat was not identified in the Duroc population because the G allele, associated with increased backfat is fixed.
The most significant SNP (rs80985094) showed a -log10 (p-value) of 57.67, 43.29, 67.61 and 37.91 in the Synthetic, Pietrain, Landrace and Large-White population, respectively. The lead SNP, located at 66,103,958 bp on SSC5, explains up to 6% of the genetic variance in backfat (Table 4). Among the four populations which showed significant association between the lead SNP and backfat, the frequency of the allele associated with increased backfat (G) is the highest in Large-White population (0.79). In the other populations, the G allele is the minor allele (Table 4).

Haplotype Analysis
We phased the genotypes using Beagle 5.1 (Browning and Browning, 2007) and selected the most frequent haplotypes either wich contains the A and G allele in the four populations. Based on the haplotype analyses, we identified a core haplotype with four SNPs surrounding the lead SNP (GCAAG) present in all populations (size~28 Kb). Assuming that recombination reduced the common interval surrounding a new variant reducing back fat, the region containing the causal mutation was expanded by two more SNPs (one SNP upstream and one SNP downstream) to search for possible functional variants. This increases the final region of interest to 46 kb to search for the causal variant (AGCAAGC = Synthetic; AGCAAGG = Pietrain; GGCAAGA = Landrace; AGCAAGG = Large-White). The frequencies of these core haplotypes are 57.70%, 35.87%, 22.67% and 16.85% for each population (Synthetic, Pietrain, In red, lead SNP [reference allele (G)]. In green, the same region covering the core haplotype as in Table 5. Landrace and Large-White, respectively) ( Table 5). No common core haplotype surrounding the lead SNP could be identified across the populations surrounding the G allele (Table 6). Hence, we believe that the G to A mutation happened on the GCGAG haplotype and was subsequently selected to produce pigs with less backfat. This underlines the hypothesis that the G allele is the wild-type variant and the A allele marks the derived new variant. In line with this hypothesis, the core haplotype containing the wild type G allele could be identified at low frequency. Linkage disequilibrium (r 2 ) between the variants in the core haplotype and the lead SNP (A) range from r 2 = 0.03 and r 2 = 0.96 ( Table 7). The additive effect of the A and G allele on backfat across the evaluated pig populations is shown in Figure 2.

Haplotype-Based Association
A haplotype-based association analysis was performed to compare the association with the lead SNP and the core haplotype (GCAAG) as defined in Table 5 (orange haplotype). The association with the core haplotype was lower than the lead SNP (Table 8). This gives further support that the lead SNP is causal.

Whole-Genome Sequence Data Analysis
For fine mapping, we evaluated a total of 659 whole genome sequenced animals, 222 sequenced animals are homozygous for the A allele, 252 animals are heterozygous, and 185 animals are homozygous for the G allele. Analyzing the data of these subset of animals, we found two variants (besides the lead SNP) with high frequency across populations and in high LD with the lead SNP ( Table 9). The LD between these variants and the lead SNP varies among the populations ( Table 9). The lead SNP and SSC5:66097445 are annotated within the intronic region of the Cyclin D2 (CCND2) gene (ENSSSCG00000038694), while SSC5:66190273 is annotated in intergenic region upstream of the CCND2 gene.

RNA-Seq Analysis in Crossbred Animals Supports CCND2 as the Causal Gene
We studied the expression of the CCND2 and neighboring genes in a population of 100 crossbred animals in four different tissues   (liver, spleen, lung, muscle). Table 10 shows the association of the lead SNP with the expression of genes in the backfat locus (SSC5: 65-67 Mb) in four tissues.
The results show that the expression of the CCND2 gene is significantly associated with the lead SNP in liver, lung and spleen ( Figure 3). Also, the expression of the AKAP3 (all four tissues, although with low expression CPM: 3-5) and the C12orf4 (liver, muscle) gene show significant association with the lead SNP. The AKAP3 gene encodes a member of A-kinase anchoring proteins (AKAPs). This protein is highly expressed in spermatozoa may function as a regulator of motility, capacitation, and the acrosome reaction. The C12orf4 gene encodes a protein involved in mast cell degranulation. FIGURE 3 | Association of CCND2 expression with the lead SNP in four 87 crossbred individuals (4 tissues). Figure shows CCND2 expression in different genotype classes. G allele is associated with more fat, while the A allele is associated with faster growth. Figure shows that the A allele is significantly associated with increased expression of the CCND2 gene in liver (p-value: 3.0 × 10 −3 ), spleen (p-value: 1.6 × 10 −5 ) and lung (p-value: 0.02).

CCND2 Regulatory Region
The core haplotype region ranges from SSC5:66094630 to SSC5: 66140348 and it covers the CCND2 gene. Therefore, to discover potentially regulatory elements, we used ChIP-seq and ATAC-Seq public datasets to investigate the region of interest. ChIP-Seq data was used to detect enhancers and promoters marked by H3K4me3, H3K4ac27 and H3K4me1. ATAC-Seq was used to detect open chromatin. We have called broad peaks for marks on ATAC-Seq and ChIP-Seq (Figure 4 and Supplementary Material S2). All markers have peak signals just downstream of the lead SNP and are partially covered within the core haplotype region (Supplementary Text S1). In human a predicted promoter is located at the lead SNP location, further supporting a potential regulatory role of the lead SNP (Supplementary Material S3).

DISCUSSION
Backfat is one of the most important traits in pork production and is closely related to meat quality and efficiency-related traits in pigs. The composition of lipid in backfat is mainly triacylglycerol (Roongsitthichai and Tummaruk, 2014) which exhibits a strong genetic component but is also influenced by environmental factors like feed intake (Wood et al., 1989)In the current study we have confirmed the presence of a QTL region on SSC5 with strong association with backfat in four out of five evaluated pig populations.
The lead SNP (rs80985094) explains up to 6% of the genetic variance and 3% of the phenotypic variance of backfat in the evaluated populations. However, the G allele of the lead SNP is fixed in the Duroc population, associated with increased backfat, explaining the absence of this QTL in Duroc. Durocs are known to have generally higher fat and intramuscular fat content compared to populations of Large White, Landrace, or Pietrain origin.
Several studies have reported QTLs and candidate genes to possibly identify functional mutations in different loci associated with backfat in pigs (Fontanesi et al., 2012;Zhu et al., 2014;Pant et al., 2015). We have identified one strong candidate SNP (rs80985094) located on SSC5:66103958 with very strong association with backfat across populations. Based on the results reported in the PigQTLdb, 19 QTL were previously reported in the same region (SSC5:66 ± 0.5 Mb), in which three of them were related to backfat (QTL identification = 139184, 139225 and 22298) (Onteru et al., 2013;Reyer et al., 2017). According to Reyer et al. (2017), the lead SNP (rs80985094) was also identified as the most significant SNP in the identified region associated with backfat. However, no functional candidate gene was described by these authors for this QTL region. Recently (Delpuech et al., 2021), describe a backfat QTL in the same region in French Large White at rs342862483 (SSC5:66100317) and (Gozalo-Marcilla et al., 2021) pinpoint FGF23 as a candidate gene in the same QTL region in different pig lines, these differences might be caused due the number of markers used in each study (660K vs 80K). However, given the proximity of the candidate causal SNP and the influence on the expression we argue that the CCND2 gene is the causal gene affecting backfat at this locus.
Pietrain, Synthetic, and Landrace populations have high frequency for the A allele, which is associated with less backfat, while Large White has a high frequency for the G allele which is associated with more backfat. This result is in line with the selection in the Pietrain, Synthetic, and Landrace populations for leanness and feed efficiency compared to the Large-White population.
The CCND2 gene was found to be associated with adipose tissue development and differentiation (Vaittinen et al., 2011). Moreover (Le et al., 2017), identified the CCND2 gene as a possible candidate for backfat conformation in Landrace. These findings are now supported by our results showing that the A allele of the lead SNP is associated with increased FIGURE 4 | ATAC-Seq and ChIP-Seq (H3K27ac, H3K4me1 and H3K4me3) marks in the CCND2 backfat locus. Broad Peaks in the core haplotype region were called using MACS2.
Frontiers in Genetics | www.frontiersin.org May 2022 | Volume 13 | Article 871516 8 expression of the CCND2 gene in liver, spleen and lung. This finding is in line with previous observations showing that adipogenic triggering leads to a strong downregulation of cell cycle and proliferation genes (including CCND2) (Marcon et al., 2019). Moreover, a study in chicken showed that overexpression of Dnmt3a1 significantly upregulated the mRNA level of cellcycle-related genes including CCND2, but decreasing mRNAs and proteins involved in adipogenesis (Abdalla et al., 2018). In addition, we discovered regulatory elements in the core haplotype region flanking the lead SNP supporting a regulatory role of the causal variant. Together the results reported in literature show that the overexpression of cell-cycle-related genes including the CCND2 gene suppresses adipogenesis.
In addition to the CCND2 gene, the core haplotype covers a region also overlaps with the LOC106507534 (a non-conding RNA). In human, the CCND2 gene region is located on chromosome 12 and LOC106507534 (pig annotation) is orthologous to a long non-coding RNA (lncRNA) named CCND2-AS1 (human annotation). This lncRNA is associated with regulation of Wnt/b-catenin signaling which directly regulates cell proliferation, cell polarity and cell fate determination during embryo development and tissue homeostasis (Logan and Nusse, 2004;Zhang et al., 2017). Moreover, Zhang et al. (2017) have identified that CCND2-AS1 plays an important role in glioma cells proliferation through the regulation of the Wnt/b-Catenin pathway. Hence both CCND2 and CCND2-AS1 are involved in cell proliferation which is associated with the differentiation of the adipogenic and myogenic cell lineages in early development. However, the pig CCND2-AS1 was not expressed in our RNA-seq datasets. At the light of these facts, our findings suggest that CCND2 can also regulate Wnt/b-Catenin pathway in prenatal life impacting fat deposition even after birth in pigs. However, the exact involvement of CCND2-AS1 with this QTL region remains unclear.
We hypothesize that the lead SNP is the causal variant by affecting CCND2 gene expression. The association found on SSC5 is highly significant, and the lead SNP explains up to 6% of the genetic variance of backfat in the evaluated populations. The Duroc population was the only population in which we did not identify the QTL. Interestingly, the Duroc population has been selected for improved meat quality and fixation of the G-allele may indicate that this SNP or underlying gene can have some influence in backfat and meat quality. On the other hand, populations highly selected for leanness and feed efficiency (Synthetic, Landrace and Pietrain) show a low frequency of the G allele. If the mutation from G to A occurred in a common ancestor of the white lines, selection will have acted against the G allele in these populations. Due to recombination, the size of the original haplotype on which the mutation occurred is reduced with each generation. Indeed, we still find a core haplotype surrounding the A variant which the four white lines have in common. Furthermore, we also find the original core haplotype with the wild type G allele GCGAG at low frequency in all four white populations. Indeed, this haplotype increases back fat in all four lines (Table 11). This further supports the hypothesis that the lead SNP is the causative mutation itself as we cannot identify any other variant in high LD across all four populations in this core region surrounding the A allele. It remains to be elucidated why the G allele is still present at a rather high frequency in commercial pig populations that have been heavily selected for leanness. However, this could be linked to correlated effects on other traits.
Interestingly, the A allele, associated with less backfat seems to be the ancestral allele in other mammals such as human, horse and cow, where the A allele is present. However, we did not find animals that have the A allele in Asian or European wild boars, indicating that for the pig lineage, the G allele is the ancestral allele. We did however find the A allele in several European local breeds including the Angler Sattelschwein, Bunte Bentheimer, and in Tamworth (Supplementary Material S3). The Angler Sattelschwein is a cross between the German black-and-white Landrace and the Wessex Saddleback, while the Tamworth breed is relatively lean. These results show that the A allele has likely originated in white lines in Europe. However, we cannot exclude recent crossbreeding of European local breeds with breeds that carry the A allele.
We report the CCND2 gene as a strong candidate gene for backfat deposition in pigs supported by changes in expression and involvement in adipogenesis. However, the exact molecular mechanisms underlying this QTL region remain to be discovered.

CONCLUSION
In this study, we identified a QTL region on SSC5 with strong association with backfat across four pig populations. The lead SNP, rs80985094, explained up to 6% of genetic variance of backfat. Gene expression analysis shows that the SNP is associated with the expression of the CCND2 gene which is involved in adipogenesis during prenatal development and thereby influencing the expressed phenotype in the adult life. Moreover, we believe that the lead SNP is the causal mutation based on fine mapping using the WGS data and the significant eQTL result for the CCND2 gene.
Our results will open up further study to validate the effect of the causal mutation and gene and to better understand its role on fat deposition and muscularity in pigs.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because the data used in this study have been obtained as part of routine data collection from Topigs Norsvin breeding programmes, and not specifically for the purpose of this project. Therefore, approval of an ethics committee was not mandatory. Sample collection and data recording were conducted strictly according to the Dutch law on animal protection and welfare (Gezondheids-en welzijnswet voor dieren).