A Missense Mutation in the MYBPH Gene Is Associated With Abdominal Fat Traits in Meat-Type Chickens

Chicken is an important source of protein for human nutrition and a model system for growth and developmental biology. Although the genetic architecture of quantitative traits in meat-type chickens has been the subject of ongoing investigation, the identification of mutations associated with carcass traits of economic interest remains challenging. Therefore, our aim was to identify predicted deleterious mutation, which potentially affects protein function, and test if they were associated with carcass traits in chickens. For that, we performed a genome-wide association analysis (GWAS) for breast, thigh and drumstick traits in meat-type chickens and detected 19 unique quantitative trait loci (QTL). We then used: (1) the identified windows; (2) QTL for abdominal fat detected in a previous study with the same population and (3) previously obtained whole genome sequence data, to identify 18 predicted deleterious single nucleotide polymorphisms (SNPs) in those QTL for further association with breast, thigh, drumstick and abdominal fat traits. Using the additive model, a predicted deleterious SNP c.482C > T (SIFT score of 0.4) was associated (p-value < 0.05) with abdominal fat weight and percentage. This SNP is in the second exon of the MYBPH gene, and its allele frequency deviates from Hardy–Weinberg equilibrium. In conclusion, our study provides evidence that the c.482C > T SNP in the MYBPH gene is a putative causal mutation for fat deposition in meat-type chickens.


INTRODUCTION
Chicken is an important source of protein for human nutrition and a model system for growth and developmental biology (Ellegren, 2005). The genome sequence of the Red Jungle Fowl (Gallus gallus gallus), considered the ancestor of the domestic chicken (G. g. domesticus) (Abplanalp, 1992;Dodgson et al., 2011), and completed in 2004 (Hillier et al., 2004) has allowed the development of new tools for genetc studies. High throughput sequencing of several breeding lines has identified millions of single nucleotide polymorphisms (SNPs) across the chicken genome (Rubin et al., 2010;Boschiero et al., 2018) and these led to the development of high-density SNP panels (Kranis et al., 2013). The most common and frequent DNA variants are SNPs, with a density of 13 SNPs/kb in a Brazilian meattype chickenline (developed by Embrapa Swine and Poultry) (Boschiero et al., 2018).
High-density SNP panels were previously used in genome wide association studies (GWAS) to identify QTL for body weight (Gu et al., 2011), several measures of fatness Moreira et al., 2018b), breast and leg muscle weights, wing weight (Xie et al., 2012), carcass and eviscerated weights . However, discovering the causative mutations underlying quantitative trait loci (QTL) remains challenging (Ahsan et al., 2013;Moreira et al., 2018a). Linkage disequilibrium between the genetic variant present on a SNP panel and the casual mutation allows QTL detection by genome-wide association study (GWAS), but fine-mapping studies are necessary to identify which sequence mutation within the QTL is the causative mutation responsible for the phenotype of interest. Combining statistical evidence from association studies with functional annotations of the genes or genetic variants, is a helpful approach to identify potential causal mutations (Spain and Barrett, 2015).
Single nucleotide polymorphisms can have a direct impact on coding or indirect impact on regulation of gene expression and thus affect traits of economic interest in animal models and livestock species (Roux et al., 2014). SNPs that occur in coding regions can be classified as missense when triplets code for different amino acids, synonymous when triplets code the same amino acid or nonsense when the mutation results in a premature stop codon. Missense SNPs can be predicted as deleterious or tolerated using the SIFT tool (Sorting Intolerant From Tolerant) (Ng and Henikoff, 2003). Changes at wellconserved positions tend to be deleterious because important amino acids are conserved across a protein family Henikoff, 2002, 2003). When a SNP is predicted as a deleterious mutation, it means that a change in the amino acid sequence likely affects the protein structure and function (Ng and Henikoff, 2003;Kumar et al., 2009), and consequently, may alter one or more phenotypes. Previous studies have identified missense SNPs associated with body weight at hatch, semi-eviscerated carcass weight, eviscerated carcass weight, leg muscle weight and carcass weight (Wang et al., 2015), abdominal fat weight, body weight at different ages and body size traits (Han et al., 2012). Phenotype may also be affected by SNPs located on potentially neutral regions (Berulava and Horsthemke, 2010;Jo and Choi, 2015), as these may be regulatory.
Previous whole-genome resequencing studies performed in parental individuals from the meat-type TT chicken reference population studied herein identified several predicted deleterious SNPs (Moreira et al., 2018b). However, the potential role of predicted deleterious SNPs in the regulation of traits of economic interest is still unknown. In this study, we used a GWAS to identify regions associated with carcass traits in a meat-type population, then integrated whole genome sequence data to refine the list of candidate mutations, followed by targetted resequencing to identify predicted deleterious SNPs and association study to identify putative causal mutations.

Experimental Population
The TT Reference population used in this study was generated from the Brazilian TT broiler line, developed by the Embrapa Swine and Poultry National Research Center. This line has been under multiple-trait index selection since 1992, representing many generations, with the goals of increasing body weight and carcass yield, improving viability, fertility, hatchability, feed conversion, and reducing abdominal fat (Do Rosário et al., 2009). The TT Reference Population was developed from expansion of the TT selection line comprising crossing of 20 males with 92 females in five hatches, yielding 1,430 chickens (Da Cruz et al., 2015;Marchesi et al., 2017).

Phenotype Measurement
Body weight at 42 days of age (BW42) was measured six hours after fasting, then chickens were euthanized by cervical dislocation followed by bleeding. Blood samples were collected for DNA extraction during bleeding. Feathers were mechanically removed following a hot water bath (60 • C for 45 s) and carcass cuts representing breast weight (BTW), thigh weight (THW), drumstick weight (DRW) and abdominal fat weight (ABFW) were individually measured in grams. Drumstick yield (DR%), abdominal fat yield (ABF%), thigh yield (TH%) and breast weight yield (BT%) were estimated as a percentage of live body weight at 42 days of age. More details about the slaughter and phenotype measurements have been previously described (Venturini et al., 2014;Da Cruz et al., 2015).

DNA Extraction and Genotyping
Genomic DNA was extracted from 1,430 blood samples with the PureLink R Genomic DNA kit (Invitrogen, Carlsbad, CA, United States) and quantified using Qubit R 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, United States). For genotyping analyses, we used the 600 K Affymetrix Axiom TM Genotyping Array (Affymetrix, Inc., Santa Clara, CA, United States). Sample filtering parameters were DishQC ≥ 0.82 performed with Axiom TM Analysis Suite (Affymetrix R ) software and sample call rate ≥ 90% via PLINK v.1.9 software (Purcell et al., 2007). For loci, the filter parameters of call rate ≥ 98% and minor allele frequency (MAF) ≥ 2% were used. We also excluded SNPs with significant deviations from Hardy-Weinberg equilibrium (HWE) (p-value < 0.000001), those located in the sex chromosomes, and those not annotated in the chicken assembly (Gallus_gallus-5.0, NCBI). More details about the genotyping and filtering steps are in Moreira et al. (2018b). Genomic analysis workflow is presented in Figure 1.

Genome Wide Association Analysis
We used GenSel software for GWAS based on a Bayesian methodology approach for genomic prediction. Following Cesar et al. (2014) and Moreira et al. (2018b), a Bayes C model was used to estimate the genetic and residual variances for each trait that were used as priors in fitting a Bayes B model. The mathematical model used was: As described by Moreira et al. (2018b), y represents the vector of phenotypic values (BTW, BT%, THW, TH%, DRW and DR%); X is the incidence matrix for fixed effects; b is the vector of fixed effects; k is the number of SNP loci; a j is the column vector representing the SNP at locus j as a covariate, coded with the number of B alleles; β j is the random substitution effect for locus j assumed to be normally distributed N (0, σ 2 βj ) when δj = 1 but 0 when δj = 0, with δj being a random variable 0/1 indicating the absence (with probability π) or presence (with probability 1π) of locus j in the model, and e is the residual associated with the analysis. Sex and hatch were included as fixed effects in the model and BW42 (slaughter age) as a fixed covariate for THW, BTW, ABFW and DRW.
We assumed π = 0.9970 in the BayesB models and obtained 41,000 Markov chain Monte Carlo (MCMC) samples with the first 1,000 samples being discarded. A map file was used to position the consecutive markers into 947 non-overlapping 1 Mb windows. So, for each window, the proportion of the genetic variance explained by the QTL was computed among individuals for every 100 th iteration of the MCMC chain based on the marker effects sampled in that iteration as performed by Schurink et al. (2012). The windows that had the marker with higher model frequency in the MCMC iterations had their effect predicted as mentioned by Van Goor et al. (2015). Each window is expected to explain 0.1054% of the genetic variance (100%/947) based on an infinitesimal model (Onteru et al., 2013;Van Goor et al., 2016). Were further considered as significant windows those that explained five times more variation than expected (0.53%) in an infinitesimal model, as used by Moreira et al. (2018b).

SNP Selection and Custom Amplicon Design
Among the 20 founding males of the TT Reference population, 14 were resequenced by our group, resulting in approximately 13X sequencing coverage using a HiScanSQ (Illumina) sequencer and that produced a dataset of good quality SNPs we deposited in the EVA-EMBL database 1 . Further details about library preparation, sequencing and filtering are already published (Boschiero et al., 2018;Moreira et al., 2018b). Functional annotation of the SNPs was performed using VEP [Variant Effect Predictor, v.86, (McLaren et al., 2016)] and deleterious predictions were based on SIFT scores (Ng and Henikoff, 2003). To investigate the segregation of the predicted deleterious SNPs detected in the 14 founder males, we selected 237 descendents for targeted resequencing. These offspring represent 14 half-sib and 37 full-sib families. In each of the full-sib families, five to seven animals were re-sequenced. For the genotyping by sequencing methodology, a region of 150 bp centered around each predicted deleterious SNP was used to define a target region for amplicon design (DesignStudio online platform from Illumina).
This study investigated only predicted deleterious SNPs located within significant QTL regions associated with the carcass traits and abdominal fat traits (Moreira et al., 2018b), to narrow the causative mutation search. To avoid overlaps between the amplicons and maximize the number of regions sequenced, we used the Tagger tool in Haploview software (Barrett et al., 2005) to select one SNP per haplotype block of interest. Thus, if adjacent predicted deleterious SNPs exhibited r 2 > 0.7, just one potential causative SNP was chosen to be genotyped by sequencing in the offspring generation.

Target Sequencing
Genomic DNA of 237 offspring was extracted using the PureLink R Genomic DNA kit (Invitrogen, Carlsbad, CA, United States) and then quantified using a Qubit R 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, United States). DNA integrity was evaluated in 1% agarose gel. Library preparation was performed according to Truseq R Custom Amplicon Low Input Kit Reference Guide (Illumina Technology). Libraries were quantified with quantitative real time PCR, using KAPA R Library Quantification kit (KAPA Biosystem) and fragment size was estimated using either Bioanalyzer R (Agilent Technologies) or a Fragment Analyzer (Advanced Analytical Technologies). Paired-end sequencing with a read length of 150 bp was performed on a MiniSeq TM (Illumina Technology).

Data Analyses, Variant Calling and Functional Annotation of Sequencing Data
We aligned the raw sequence reads against the chicken reference genome Gallus_gallus5.0 (NCBI) with BWA-MEM (v.0.7.15). For SNP calling, we used SAMtools v.1.3.1 (Li et al., 2009) with mpileup option (Li, 2011), and filtered based on mapping and base qualities (Phred) ≥ 20. The variant calling was performed after sequence reads from all 237 animals were pooled together. After the initial variant identification, we applied the following filtering options: INDEL removal, minor allele frequency (MAF) ≥ 0.05, SNP call rate ≥ 0.7, biallelic locus, sequencing depth ≥ 15 and Phred score quality ≥ 40.
After filtering, we annotated those SNPs that remained using the VEP tool version 91 (McLaren et al., 2016) available from the Ensembl v.91 website (Zerbino et al., 2018), where the SIFT score was predicted. We searched for gene ontology terms in Amigo 2 GO online server 2 . Deviations from Hardy-Weinberg equilibrium were calculated in Haploview software (Barrett et al., 2005) and Bonferroni multiple test correction was applied.

Association Analysis
For association analysis, we tested 18 predicted deleterious SNPs located in QTL regions using the package lmerTest 3.1-0 (Kuznetsova et al., 2017) in R software (v. 3.5.2) 3 . To investigate the SNP effects, we fitted both additive and non-additive models for each trait (BTW, BT%, THW, TH%, DRW, DR%, ABFW, and AB%). As adopted by Edwards et al. (2015), for the additive models the SNPs were considered as continuous variables (0, 1, or 2 copies of a given SNP) and in the non-additive models the variable genotypes were considered as a factor (with three levels i.e. aa, Aa, AA). Sex and hatch were added in the model as fixed effects, and family (determined by the mother) was fitted as a random effect. For the carcass weight traits (BTW, THW, DRW and ABFW), BW42 was used as a fixed covariate. For each trait, all the SNPs located in the QTL associated with the respective trait were simultaneously fitted in the following model: Where y is the vector of observations for the measured phenotype; X is the incidence matrix relating the fixed effects for sex and hatch to y; β is the vector of sex and hatch fixed effects; W is the genotype matrix for the predicted deleterious SNPs located in the associated QTL regions; and a is the vector of fixed effects for the SNPs. The matrix Z is an incidence matrix relating u to y; u is the vector of length 41 representing the random effect of dam family; and e is the vector of residual effects. Associations were considered significant at comparison-wise p-value < 0.05.

Phenotype Measures and Descriptive Statistics
Descriptive statistics such as number of animals, averages and standard errors from traits and animals utilized in GWAS are presented in Table 1. Summary statistics for all traits and animals used in genotyping and single SNP association analysis are presented in Table 2.

SNP Selection and Amplicon Design
We searched for predicted deleterious SNPs located within the QTLs to identify potential causative mutations. We scrutinized SNP data from whole-genome resequencing of 14 founding animals (TT line), approximately 11 million SNPs (Boschiero et al., 2018), concordant with the five selected GWAS windows.
This recovered 89 predicted deleterious SNPs. After linkage disequilibrium pruning in the founder's dataset, 18 uncorrelated predicted deleterious SNPs were selected for amplicon design and association analysis. One predicted deleterious SNP was tested for each of the following regions: GGA 1 (166 Mb) window associated with DRW and DR%, GGA 22 (4 Mb) associated with THW and TH% and GGA 25 (2 Mb) associated with BRW and BR%. For GGA 26 (3 Mb) associated with BRW and BR% and GGA 26 (1 Mb) associated with ABFW and ABF%, seven and eight predicted deleterious SNPs that were not in linkage disequilibrium were tested, respectively.

Amplicon Sequencing, Variant Calling and Functional Annotation
Libraries sequenced using MiniSeq produced on average 298,791 raw reads per amplicon. The average overall mapping rate of the raw reads against the Gallus_gallus5.0 (NCBI) genome assembly was 99.74%. After variant calling, quality control and functional annotation, the 18 predicted deleterious SNPs were genotyped by sequencing with an average coverage of 7,000X. Detailed information of the 18 predicted deleterious SNPs (genome position, SNP ID, located gene, allele and genotype frequencies, HWE test and SIFT score) is in Table 4. The 18 predicted deleterious SNPs were detected in the offspring, validating our whole genome sequence data and SNP calling. Two SNPs are novel so did not have rs IDs, and four SNPs were annotated in novel genes. Seven SNPs did not have any animal genotyped as homozygous for the alternative allele, and five SNPs exhibited significant deviation from HWE.

Association Analysis for Additive and Dominance Models
We used both additive and non-additive models to verify if any of the deleterious SNP identified in each of the candidate genes were associated with the phenotype detected in the GWAS analysis. Detailed results for association tests (p-values and effects) between the SNPs and the eight traits are presented in Table 5 for the test for additive effects and Table 6 for the test for dominance. It is important to keep in mind that, for each trait, the association test was performed only with the SNPs located in the QTL region chosen for the respective trait. Based on the additive model, from the 18 predicted deleterious SNPs tested, c.482C > T was the only locus significantly associated with ABFW and ABF% ( Table 5). In the analysis testing for dominance using a nonadditive model, there were no SNPs with significant effects for any of the eight traits ( Table 6).
The SNP c.482C > T is in the Myosin Binding Protein H (MYBPH) gene and within the QTL region for ABFW and ABF% traits in GGA26. This polymorphism is a C > T change, resulting in the amino acid change of threonine to methionine at amino acid position 161 (T161M). In the additive model, the effects of the tested predicted deleterious SNP c.482C > T were −0.17% and −3.43g for ABF% and ABFW traits, respectively. Functional annotation analysis performed in Amigo 2 GO of this gene reported gene ontology (GO) terms related to protein binding (GO:0005515), regulation of striated muscle contraction (GO: 0006942), cell adhesion (GO:0007155), structural constituent of muscle (GO:0008307) and myosin filament (GO:0032982).

DISCUSSION
The economic importance of carcass weight and yield motivated the search for genomic regions and candidate genes associated with these traits (Coutinho et al., 2010;Boschiero et al., 2013;Van Goor et al., 2015;Derks et al., 2018). Combining functional studies and association analyses helped us to focus on positionally supported polymorphisms with the main goal of finding putative causal mutations for carcass traits. Here we present a report with genomic association analysis combining information of genomic regions and whole genome sequence to support the selection of predicted deleterious SNPs for association analysis with carcass traits in chickens. From GWAS analysis, 19 unique QTL for THW, TH%, DRW, DR%, BTW and BT%, were detected, cumulatively explaining 0. 54, 1.19, 8.54, 8.11, 2.64, and 4.97% of the genetic variance, respectively, with PPA > 0.69. These QTL are useful for further applications such as positional candidate genes search, gene network analysis and discovery of putative candidate mutations, as we investigated herein.
From the 19 QTL identified, five genomic windows were used here (four identified in this study and one characterized by Moreira et al., 2018b). Whitin these regions, we selected SNPs that had SIFT scores ranging from 0.0 to 0.04. Lower scores represents higher confidence and better sensitivity for detecting deleterious SNPs (Ng and Henikoff, 2003). All SNPs had an amino acid change prediction considered as moderate impact; no substitution resulted in a premature stop codon.
Deviation from HWE is an indication that allele frequency is not consistent with genotype frequency. Several evolutionary Effect values (bottom number) are presented in grams (g) for weight traits and percentage (%) for yield traits. Empty spaces demonstrate that no association test between SNP and phenotype was performed, since we only tested deleterious SNPs with their respective phenotype discovered in the GWAS analysis. It is pertinent to note that even though rs738655377 was not significantly associated with BTW or BT%, none of the animals genotyped were homozygous for the alternative allele, and the test for departure from HWE was significant. This variant is within a novel gene (ENSGALG00000028858) that presents gene ontology terms related to oxidoreductase activity. Out of the 237 animals genotyped, the expectation based on the observed allele frequency was that 4.45 animals would be homozygous for the alternative allele. The lack of homozygous animals for the alternative allele provides suggestive evidence of a lethal polymorphism when in homozygosity. However, further genotyping studies should be performed to confirm these findings based on matings between heterozygotes. Previous studies in cattle and pigs have identified lethal mutation by searching for lack of homozygous alleles (VanRaden et al., 2011;Kuehn et al., 2013;Derks et al., 2017;Jenko et al., 2019).
The significantly associated polymorphism, c.482C > T, located in the MYBPH gene is a novel SNP that caused a threonine-to-methionine substitution at amino acid position 161 (T161M). Threonine amino acid is polar (uncharged), while methionine is a non-polar amino acid. In Myosin-binding protein H, the substitution is located in the Fibronectin type-II domain (UniProt accession number Q05623 and PROSITE accession number PS50853) and this domain is the most common of the fibronectin subdomains (Hynes, 1990). Fibronectin is a dimeric glycoprotein involved in cell adhesion, cell morphology, cell migration, and embryonic differentiation.
The MYBPH presents gene ontology terms related to muscle growth and development such as: protein binding, regulation of striated muscle contraction, cell adhesion, structural constituent of muscle and myosin filament. In chickens, an increase in the amount of MYBPH protein in breast muscle is associated with muscle development at 56 days of age (Liu et al., 2016). In cattle, the MYBPH gene was previously shown to be associated with intramuscular fat deposition (Poleti et al., 2018;Bazile et al., 2019). A proteomic study performed in Longissimus dorsi samples in cattle found that MYBPH protein was significantly more abundant (p-value < 0.05) in a group of cattle exhibiting low intramuscular fat content (Poleti et al., 2018). Another bovine proteomic study found a significantly lower abundance of MYBPH protein in the high adiposity group (Bazile et al., 2019). Other studies with Nellore cattle found MYBPH gene differentially expressed when comparing groups with high and low intramuscular fat content (Cesar et al., 2015;dos Santos Silva et al., 2019). In chickens, the MYBPH was expressed in skeletal muscle tissue, heart, spleen, brain, kidney and female gonad (UniProt Databse, Bateman et al., 2021). Based on the association results in our population, the allele substitution effect of the predicted deleterious SNP was a reduction in 3.43 g of ABWF and 0.17% of ABF% for each copy of the alternative allele in the genotype.
It is important to mention that the polymorphism c.482C > T herein might not be the causal mutation because it was in strong linkage disequilibrium (LD block GGA26 1008728 Mb -1015596 Mb with 0.92 r 2 ) with two other polymorphisms in the MYBPH gene, rs313948592 (A/C, located in the intron) and rs312491203 (G/A change, located upstream of the gene). We are aware that polymorphism in intron or upstream regions can have regulatory impact and we cannot rule predicted deleterious SNP as causative mutations (Jo and Choi, 2015;Li et al., 2020). Nevertheless, this study indicates that MYBPH gene could have an impact on fat deposition in chickens. Further functional studies may help to elucidate the role of MYBPH and its polymorphisms in regulating fatness in meat-type chickens. Also, the other polymorphisms identified in the target regions are available at European Variation Archive (EVA) -EMBL-EBI (accession PRJEB25004) for further studies of causal mutations in these regions.
Our strategy has provided novel evidence in the identification of a putative causative mutation associated with abdominal fatness in meat-type chickens. We report one predicted deleterious SNP associated with abdominal fat traits located in the MYBPH gene that is a candidate gene for fat deposition. The main limitation of our study is not being able to discriminate whether the identified mutation is the causative mutation, or it is simply in strong linkage disequilibrium with the real causal mutation. Nevertheless, our findings can be applied in poultry breeding programs focused on carcass traits improvement.

DATA AVAILABILITY STATEMENT
All data generated or analyzed during this study are public and included in this published article. The SNP report (identified by sequencing) was submitted to European Variation Archive (EVA) -EMBL-EBI, accession number PRJEB25004. The datasets used and/or analyzed during the current study (genotypes and phenotypes) are available from the corresponding author on reasonable request.

ETHICS STATEMENT
In this study, all experimental protocols that used animals were performed in agreement with resolution number 010/2010 approved by the Embrapa Swine and Poultry Ethics Committee on Animal Utilization (CEUA) in Concordia, Santa Catarina State -South of Brazil, in agreement with the rules of National Council of Animal Experimentation Control (CONCEA) to ensure compliance with international guidelines for animal welfare.

AUTHOR CONTRIBUTIONS
PT, GCM, CB, ML, GBM, and LC conceived the idea of this research and participated in the experimental design. ML and LC provided the experimental environment, phenotype, and data analysis support. PT, GCM, CB, JP, AC, GRM, and DG performed data analysis. PT drafted the manuscript. PT, GCM, CB, AC, JP, GRM, ML, GBM, DG, and LC collaborated with interpretation, discussion, and writing of the manuscript. All authors have read and approved the final manuscript.

FUNDING
The authors declare that this study received funding from São Paulo Research Foundation (FAPESP) thematic grant 14/08704-0 and Brazilian Agricultural Research Corporation -Embrapa (project number 01.11.07.002.04.02), both founders contributed to experimental environment, phenotype measure, sequencing, genotyping. The TT Reference Population was subsidized by the National Council of Scientific and Technological Development (CNPq) grant number 481755/2007-1 from the Brazilian Government. PT received a fellowship from FAPESP, grant 2016/13589-0. GCM received fellowships from FAPESP, grants 14/21380-9 (in cooperation agreement with CAPES), and 16/00569-1. CB received a fellowship from the program Science without Borders from CNPq grant 370620/2013-5. ML is employee of the Brazilian Agricultural Research Corporation -Embrapa. L.L.C. GRM, ML, and GBM are recipient of productivity fellowship from CNPq.