ORIGINAL RESEARCH article
A Meta-Analysis Including Pre-selected Sequence Variants Associated With Seven Traits in Three French Dairy Cattle Populations
- 1UMR GABI, INRA, AgroParisTech, Université Paris Saclay, 78350 Jouy en Josas, France
- 2Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- 3ALLICE, Paris, France
A within-breed genome-wide association study (GWAS) is useful when identifying the QTL that segregates in a breed. However, an across-breed meta-analysis can be used to increase the power of identification and precise localization of QTL that segregate in multiple breeds. Precise localization will allow including QTL information from other breeds in genomic prediction due to the persistence of the linkage phase between the causal variant and the marker. This study aimed to identify and confirm QTL detected in within-breed GWAS through a meta-analysis in three French dairy cattle breeds. A set of sequence variants selected based on their functional annotations were imputed into 50 k genotypes for 46,732 Holstein, 20,096 Montbeliarde, and 11,944 Normande cows to identify QTL for milk production, the success rate at insemination of cows (fertility) and stature. We conducted within-breed GWAS followed by across-breed meta-analysis using a weighted Z-scores model on the GWAS summary data (i.e., P-values, effect direction, and sample size). After Bonferroni correction, the GWAS result identified 21,956 significantly associated SNP (PFWER < 0.05), while meta-analysis result identified 9,604 significant SNP (PFWER < 0.05) associated with the phenotypes. The meta-analysis identified 36 QTL for milk yield, 48 QTL for fat yield and percentage, 29 QTL for protein yield and percentage, 13 QTL for fertility, and 16 QTL for stature. Some of these QTL were not significant in the within-breed GWAS. Some previously identified causal variants were confirmed, e.g., BTA14:1802265 (fat percentage, P = 1.5 × 10−760; protein percentage, P = 7.61 × 10−348) both mapping the DGAT1-K232A mutation and BTA14:25006125 (P = 8.58 × 10−140) mapping PLAG1 gene was confirmed for stature in Montbeliarde. New QTL lead SNP shared between breeds included the intronic variant rs109205829 (NFIB gene), and the intergenic variant rs41592357 (1.38 Mb upstream of the CNTN6 gene and 0.65 Mb downstream of the CNTN4 gene). Rs110425867 (ZFAT gene) was the top variant associated with fertility, and new QTL lead SNP included rs109483390 (0.1 Mb upstream of the TNFAIP3 gene and 0.07 Mb downstream of PERP gene), and rs42412333 (0.45 Mb downstream of the RPL10L gene). An across-breed meta-analysis had greater power to detect QTL as opposed to a within breed GWAS. The QTL detected here can be incorporated in routine genomic predictions.
Ideally, the magnitude of estimated effects on the quantitative trait of interest could be used to rank single nucleotide polymorphisms (SNP) for a functional genomic study to identify causal variants. The physical locations of the associated SNP on the genome are then flagged and mined for causative variants underlying quantitative trait loci (QTL). Causal variants reported for QTL in previous dairy cattle studies include polymorphisms causing variation in milk production, fertility, and embryo mortality (Grisart et al., 2002; Hoff et al., 2017; Michot et al., 2017; Bouwman et al., 2018). Most analyses testing genomic regions, like those above, perform within-breed GWAS (Chang et al., 2018). Even so, our knowledge regarding causal variants in these genomic regions remains limited because the GWAS analyses yield similar P-values for many adjacent SNP variants, a consequence of linkage disequilibrium (LD). Even with a multitude of GWAS results, strong LD prevents distinguishing the actual causal variant from linked markers (Goddard and Hayes, 2009). A meta-analysis can be used to improve the resolution of QTL detection and identify causal variants provided that LD is conserved at short distances across breeds (van den Berg et al., 2016). The main advantage of a meta-analysis is that it allows simultaneous analysis of many breeds by combining GWAS summary statistics across populations, thereby increasing power to detect QTL (van den Berg et al., 2016; Bouwman et al., 2018).
In this paper, we report an across-breed meta-analysis of GWAS summary statistics based on imputed pre-selected sequence variant genotypes from 78,772 cattle from three dairy breeds and for seven traits. The meta-analysis across breeds allowed us to identify 142 QTL for milk production, stature, and fertility.
Materials and Methods
Studied Population and Phenotypes
We studied phenotypes from three French dairy cattle breeds. No ethical approval was necessary since the data used was from existing databases. Phenotypes were from cows born between 2007 and 2013 in the French dairy production regions and were defined as follows: (1) Production traits including milk, protein, and fat yields were obtained from test-day records expressed as 305 d yield (kg). Fat content (g/kg) was calculated as 1,000*(fat yield/milk yield). Protein content (g/kg) was calculated as 1,000* (protein yield/milk yield). (2) Fertility was defined as success/failure at each insemination of lactating cows and recorded as a success (= 1) or a failure (= 0). (3) Stature was measured as the vertical distance from the plank to the sacrum.
These traits were analyzed using the French national evaluation model (Boichard et al., 2012b) to obtain Yield Deviations (YD) (VanRaden and Wiggans, 1991). A YD can be defined as the mean performance adjusted for all environmental effects, including the permanent environment for cows with multiple records. The model varied according to the trait, especially concerning environmental factors. We retained traits for genotyped cows and this included data from 46,732 Holstein cows, 20,096 Montbeliarde cows, and 11,944 Normande cows.
Genotyping and Imputation
Cows were genotyped using the Illumina Infinium® BovineSNP50 BeadChip (50 K, Illumina, San Diego, CA), BovineLD BeadChip (Boichard et al., 2012a) or one of the first four versions of the EuroG10k SNP chip (Boichard et al., 2018). The EuroG10k SNP chip is composed of two parts: (1) Between 7,000 and 8,500 generic (and supposedly neutral) SNP from BovineLD Genotyping BeadChip v.2 (Boichard et al., 2012a). (2) Whole genome sequence variants-a custom part of up to 7,232 SNP selected from sequence data as part of 1,000 Bull Genomes Project Run 4 (Daetwyler et al., 2014) based on their functional annotation. The full description of the EuroG10k chip and quality control procedure was previously described by Marete et al. (2018a) and Boichard et al. (2018). Cows with phenotypes were rather old animals and were genotyped with the 50 k chip. Younger animals including cows without phenotypes were genotyped with the EuroG10k chip. To get complete marker information across animals, and to include the sequence variants in the analysis, we run two successive steps of imputation for the entire populations (males and females, with and without phenotypes, see Figure 1). First, we imputed all 50 k markers for all animals using animals with 50 k genotypes as a reference. Then we imputed sequence variants using the animals genotyped with the EuroG10k chip as the reference population. The imputation was done within breed using the FImpute software (Sargolzaei et al., 2014). After imputation, 48,576 SNP distributed across 29 Bos taurus autosomes (BTA) remained. Of these, 42,967 were from the 50 k chip, and 5,609 sequence variants selected from French cattle populations. Imputation accuracy was estimated using both concordance rate and allelic squared correlation (r2), both for animals genotyped with 50 k only and those genotyped with EuroG10k chip only. In both instances, we randomly masked 20% of the genotyped markers on each chromosome and 15% of the cows per breed, then performed an imputation and compared the imputed genotypes with true genotypes by estimating a Pearson correlation coefficient. For the 50 k SNP, the average concordance was >0.98, and average allelic r2 was >0.97, and for the sequence variants, the average concordance was >0.95, and average r2 was >0.96 for all breeds.
Figure 1. Genotype imputation workflow. Table showing number of animals genotyped with various chips is included in Supplementary Table 1.
Association Studies, Meta-Analysis, and QTL Heterogeneity
First, 48,576 SNP (pre-selected sequence variants and 50 k SNP) were tested individually for association with each trait within the breed. The GCTA software (Yang et al., 2011) was used to fit a mixed linear regression model to test associations between a SNP and the trait. For any given trait, the fitted model was
where y was a vector of phenotypes (yield deviation) for all cows, 1 was a row vector of 1s, μ was the mean, x was the vector of true or imputed allele dosages for the SNP, β was the fixed allele substitution effect of the SNP, g was the vector of random additive genetic effects with , G being the genomic relationship matrix, was the variance explained by all the SNPs, ε was a vector of random residual effects with , was the error variance, and I was an identity matrix. The variance of y was . The gjk term of G matrix was estimated by
where w was the total number of SNP, zijand zik were numbers of copies of the reference allele for the ith SNP in the jth and kth cows, respectively, and pi was the frequency of the reference allele estimated from the marker data (VanRaden, 2008).
To control the type I error rate and assuming all tests were independent (a conservative assumption due to LD among SNP), a Bonferroni genome-wide correction was applied for the total number of tests for each trait i.e., the nominal type I error rate (α = 5%) was divided by the number of SNP (w = 48,576) to obtain a genome-wide error rate threshold of t = 1.03 × 10−6. Any SNP whose probability of observing the test statistics was below this value was considered genome-wide significant.
The within-breed analyses were followed by a meta-analysis using weighted Z-score model as implemented in METAL software (Willer et al., 2010). The weighted Z-score model used P-values and directions of effect estimates and weights individual GWAS based on the sample size to compute a Z-score, i.e.,
where Zb was the Z-score for breed b, Φ was the standard normal cumulative distribution, Pb was the P-value, Δb was the direction of the SNP effect estimated within breed b. Overall Z-score was calculated as:
where and Nb was the sample size for breed b. The overall P-value was calculated as:
We evaluated heterogeneity of the effect sizes across breeds using Cochran's Q-test (Cochran, 1954) as implemented in the METAL software. A QTL was deemed to segregate across populations if the heterogeneity P-value was ≤ 0.05. The functional consequence of significantly associated variants was predicted using the Variant Effect Predictor tool from Ensembl Genome Browser 90 (McLaren et al., 2016). We classified variants that (i) mapped to a gene, (ii) < 5-kb to known genes, and, (iii) >5-kb from any coding region. In case of multiple variants representing the same gene, we kept the variant associated with most traits and with the lowest P-value as representative for that gene.
Identification of QTL That Segregates Within and Across Breeds
Since it is challenging to identify the causative variant, we aimed at determining the QTL region that most likely harbor the causal variant. First, for each chromosome, we subjectively inspected significantly associated variants within the genomic regions. Secondly, we defined the within-breed QTL region as 1 Mb windows and the SNP with the lowest P-value designated as the lead SNP. A QTL was selected as significant in more than one breed if any of the SNP in the QTL region of the other breed had a P-value lower than the genome-wide threshold (P < 1.03 × 10−6). The UMD3.1 (Zimin et al., 2009) bovine genome annotation was used to annotate genes centering within 1 Mb intervals on the lead SNP. We compared these annotations to known QTL for bovine milk production, fertility and stature traits using QTLdb (Hu et al., 2016) and literature review.
Overall, there were 21,956 SNP significantly associated with traits in three French cattle breeds. Forty-three percent of these associated SNP were from Holstein, 24% from Montbeliarde, 14% from Normande. Seven percent of these SNP showed association in both Holstein and Montbeliarde, 4% in Holstein and Normande, 5% in Montbeliarde and Holstein, and 3% in all three breeds. Production traits and stature had more significant hits across breeds, compared to fertility. As presented in Figures 2A–E, the overlap of significant tests in any two-breed combination and for all traits was most evident between Holstein and Montbeliarde (1,621 SNP) and least evident between Montbeliarde and Normande (1,082 SNP). Seven hundred and one SNP were significantly associated with the same trait in all three breeds.
Figure 2. GWAS summary statistics. Venn diagrams representing significant SNP from GWAS for milk yield (A), fat percentage (B), protein percentage (C), stature (D), and fertility (E). Composite Manhattan and corresponding quantile-quantile (QQ) plots (I) showing the association of imputed sequence variant with fat and protein percentage in Montbeliarde (F), Normande (G), and Holstein (H). Each dot shows the most significant P-value that was observed across both traits. Light green color represents sequence variants with P < 1 × 10−6.
As presented in Figures 2F–I, some of the imputed sequence variants had lower P-values compared with the 50 k variants P-values. Although more than 13% of the pre-selected sequence variants had MAF between 0.5 and 5%, only 2 QTL lead SNP had MAF lower than 5%. We did not detect QTL with MAF < 5% in Normande. Among the pre-selected sequence variants, 1,478 were significant (P < 1.03 × 10−6).
Within Breed GWAS and Multi-Breed Meta-Analysis
We observed both known and novel QTL in all three cattle breeds. The most significant QTL lead SNP from meta-analysis are presented in Table 1. We observed QTL on 16 chromosomes where Holstein had 23 QTL, Montbeliarde had 10 QTL, Normande had 15 QTL. The most significant lead SNP was located at Bos taurus autosome (BTA) 14:1801116 (rs109421300, P = 9.7 × 10−283), an intronic variant near the causal variants in the DGAT1 gene. Other significant QTL were observed at BTA5:50804085 (P = 1.3 × 10−22, intergenic), rs109205829 (BTA8:30088133, P = 2.3 × 10−21, NFIB gene), and BTA20:32670639 (P = 4.0 × 10−41, intergenic flanking 3′ end of the GHR gene).
A QTL on BTA8 at 29–30 Mb with the same lead SNP at BTA8:30088133 (rs109205829) was observed in both Holstein and Montbeliarde. This lead SNP is an intronic variant within the NFIB gene, and the meta-analysis had a lower P-value (P = 2.3 × 10−22) than either of the single breed GWAS (P = 8.4 × 10−14). Another QTL on BTA22 at 23–26 Mb with a lead SNP at BTA22:24219999 (rs41592357) was observed in all three breeds. Rs41592357 is an intergenic variant 1.38 Mb upstream of the CNTN6 gene and 0.65 Mb downstream of the CNTN4 gene. Some QTL overlapped between Holstein and Montbeliarde, some with same lead SNP, but others with different lead SNP, e.g., BTA15 at 57–61 Mb and BTA25 at 10–11 Mb. The lead SNP for the QTL on BTA15 were rs110049689 (BTA15:57333896, P = 1.5 × 10−7, MYO7A gene) in Montbeliarde, and rs41690133 (BTA15:56842162, P = 2.4 × 10−12, intergenic) in Holstein. On BTA25, the lead SNP was rs42062121 (BTA25:11015593, SNX29 gene) in both Holstein (P = 1.1 × 10−7) and Montbeliarde (P = 2.5 × 10−6).
Fat Yield and Percentage.
We observed QTL on 19 chromosomes where Holstein had 26 QTL, Montbeliarde had 12 QTL, Normande had 5 QTL. Meta-analysis confirmed BTA14:1802265 (P = 1.5 × 10−760, DGAT1 gene). Other QTL lead SNP whose meta-analysis P-values exceeded the GWAS significance threshold include: rs110824611 (BTA3:11040167, P = 1.2 × 10−7, OR6K6 gene), rs41616289 (BTA8:93896073, P = 2.9 × 10−7, intergenic), rs109394729 (BTA10:46486647, P = 1.9 × 10−7, USP3 gene), rs41668653 (BTA11:63467507, P = 1.0 × 10−10, intergenic), rs110249976 (BTA15:53166998, P = 4.8 × 10−9, FCHSD2 gene), rs41921177 (BTA19:51326750, P = 8.0 × 10−14, CCDC57 gene), and rs109599512 (BTA27:36117365, P = 9.4 × 10−17, GOLGA7 gene).
We observed QTL overlapping in Holstein and Montbeliarde and in Holstein and Normande, but with different lead SNP. The two lead SNP for QTL significantly associated in both Holstein and Montbeliarde were BTA12:15173534 (P = 3.1 × 10−13, NUFIP1 gene, 14.5–15.6 Mb), and rs41921177 (BTA19:51326750, P = 1.1 × 10−19, CCDC57 gene, 51–52 Mb). The three lead SNP for QTL significantly associated in both Holstein and Normande were rs109394729 (BTA10:46486647, USP3 gene, 43–47 Mb), rs109599512 (BTA27:36117365, GOLGA7 gene, and 35.5–38.5 Mb), and rs41668653 (BTA11:63467507, intergenic, 61.2–64.9 Mb).
Protein Yield and Percentage.
We observed QTL on 22 chromosomes where Holstein had 31 QTL, Montbeliarde had 18 QTL, Normande had 22 QTL. We confirmed already known causal variants, BTA6:87199843 (P = 9.5 × 10−31, CSN2 gene), and BTA14:1802265 (P = 8.0 × 10−78, DGAT1 gene). As presented in Table 4, new QTL lead SNP whose meta-analysis P-value exceeded the within breed GWAS significance levels included: rs41593345 (BTA9:45303296, P = 4.3 × 10−18, PREP gene), rs41668653 (BTA11:63467507, P = 7.5 × 10−48, RAB1A gene), and rs109882115 (BTA18:58067310, P = 9.0 × 10−28, CEACAM18 gene). Two QTL overlaps were observed between Holstein and Montbeliarde, on BTA6 (85–88 Mb) with the most significant lead SNP at BTA6:87199843 (HSTN), and on BTA21 with lead SNP. rs29011638 (BTA21:37684315, P = 3.8 × 10−7, intergenic). We observed two QTL overlapping between Holstein and Normande on BTA7 (13–14 Mb) with the most significant SNP, rs41568613 (BTA7:13526016, P = 7.7 × 10−6, intergenic), and on BTA26 (21.5–24.5 Mb) with lead SNP, rs29014382 (BTA26:22339074, P = 1.8 × 10−9, intergenic). We observed two QTL overlapping among all three breeds on BTA16 (1–2 Mb) with lead SNP, rs42450080 (BTA16:1608132, P = 1.2 × 10−10, intergenic), and on BTA23 (50–51 Mb) with lead SNP, rs42507912 (BTA23:51118713, P = 1.3 × 10−11, GMDS gene). QTL overlap between fat and protein was evident, e.g., on BTA11 at 61–64 Mb with its lead SNP at rs41668653 (BTA11:63467507, P = 7.2 × 10−13, intergenic), and on BTA14 at 1.6–1.9 Mb with the most significant lead SNP being BTA14:1802265 (P = 2.6 × 10−322, DGAT1 gene).
For the success rate at insemination of lactating cows, we observed QTL on 9 chromosomes where Holstein had 13 QTL, Montbeliarde had 2 QTL, Normande had 1 QTL. As presented in Table 5, the lead SNP for new QTL from meta-analysis that exceeded the genome wide threshold (P < 1.03 × 10−6) included rs109483390 (BTA9:76868290, P = 5.2 × 10−10, 0.1 Mb upstream of TNFAIP3 gene and 0.07 Mb downstream of PERP gene), rs42412333 (BTA10:39278374, P = −1.2 × 10−8, 38–40 Mb, 0.45 Mb downstream of RPL10L gene), and rs110425867 (BTA14:8264685, P = 1.6 × 10−11, ZFAT gene 8.1–8.3 Mb). Lead SNP and QTL intervals for QTL observed only in Holstein were: rs41589904 (BTA8:76582220, P = 7.5 × 10−9, UBE2R2 gene, 74–77 Mb), rs41593363 (BTA9:53806121, P = 1.3 × 10−8, 49–55 Mb, and 6.5 kb upstream of GPR63 gene), rs110543856 (BTA18:48150900, P = 1.1 × 10−18, SIPA1L3 gene, 43–48 Mb), rs110588160 (BTA21:47117318, P = 2.6 × 10−14, intergenic, 43–48 Mb), rs29023151 (BTA22:23303686, P = 1.6 × 10−11, IL5RA gene, 22–24 Mb), rs109629413 (BTA24:33624891, P = 7.1 × 10−15, TMEM241 gene, 28–33 Mb), and two, QTL on BTA26: rs43158237 (BTA26:6741237, P = 2 × 10−11, intergenic, 6.7–6.9 Mb) and rs42096924 (BTA26:33443386, P = 2.8 × 10−11, intergenic, 24–33 Mb). Lead SNP and QTL intervals for QTL observed only in Montbeliarde were: rs41657531 (BTA9:39816061, 1.1 × 10−8, RPF2 gene, 38–44 Mb), and rs42417896 (BTA11:34068419, 4.6 × 10−6, intergenic, 31–39 Mb). The suggestive fertility QTL observed in Normande was rs41570140 (BTA11:30065164, 1.6 × 10−4, FBX011 gene, 27–30 Mb).
We identified 16 QTL on 11 chromosomes associated with stature in the three breeds (Table 5). The most significant variant was observed in the Montbeliarde breed and was BTA14:25006125 (P = 8.6 × 10−140 PLAG1 gene). Seven novel QTL for stature displayed heterogeneity across the three breeds, e.g., rs43121344 (BTA7:68281468, intergenic and 0.14 Mb toward the 3′-flanking region of MRPL22 gene) was lead SNP in all three breeds with PNOR = 3.0 × 10−4, PMON = 4.5 × 10−8, PHOL = 1.1 × 10−9, and PMETA = 7.6 × 10−19. Lead SNP for breed specific QTL in Montbeliarde include: rs41645172 (BTA2:109778246, P = 3.9 × 10−17, DOCK10 gene), rs109860141 (BTA10:51558941, P = 2.4 × 10−20 FAM63B gene), BTA14:25006125 (P = 8.6 × 10−140, PLAG1 gene), and BTA20:39917111 (P = 5.0 × 10−22, ADAMTS12 gene). Lead SNP for QTL observed in Normande include rs41644660 (BTA26:19460476, P = 3.5 × 10−8, 73 kb downstream of HPS1 gene). Two QTL lead SNP observed in Holstein include rs110652403 (BTA7:90460692, P = 4.8 × 10−12, 0.16 Mb upstream of MEF2C gene, and 0.3 Mb downstream of TMEM161B gene), and rs43430263 (BTA12:53241975, P = 3.3 × 10−14, SLAIN1 gene).
Meta-Analysis Heterogeneity Among Three Breeds
As an example of the heterogeneity observed from a meta-analysis of the three breeds, we observed peaks from a combined Manhattan plot for fat and protein percentage (Figures 3A,B) and standardized the allelic substitution effects by the phenotypic standard deviation of fat and protein percentage meta-analysis results. We observed heterogeneity (Cochran's Q, P < 0.05) for 36 QTL from the meta-analysis in three breeds (Figure 3D). We then overlay observed breed specific QTL for these two traits and Holstein had 26 QTL in 24 chromosomes, Montbeliarde had 17 QTL in 14 chromosomes, and Normande had 16 QTL in 7 chromosomes (Figure 3C).
Figure 3. Heterogeneity of SNP association with fat and protein in milk across three cattle breeds. (A) Composite Manhattan plot that shows the association of 43,421 imputed variants including 5,609 pre-selected sequence variants with fat and protein in the meta-analysis. The composite Manhattan plot summarizes the results of the meta-analyses with each dot representing the more significant P-value that was observed across both traits. Light green represents sequence variants with P < 1 × 10−6. (B) Quantile-quantile plot of the meta-analyses. Green and brown color represent P-value of 43,421 imputed variants for fat and protein, respectively. (C) Overview of 59 QTL that were significant at P < 1 × 10−6 in the meta-analysis and within-breed association studies. Each column represents one of 29 Bos taurus autosomes. Row colors are breed specific with the top row being overall meta-analysis QTL. Filled squares indicate that QTL were significant in the respective breeds and chromosome. (D) Allelic substitution effects of 36 QTL on fat and protein standardized with the phenotypic standard deviations. The vertical axis is protein, and the horizontal axis is fat.
Our meta-analysis of association summary statistics for seven traits across three French dairy cattle breeds discovered 120 QTL including 13 QTL that had not been detected at P < 1.03 × 10−6 in the within-breed analyses. In agreement with previous studies (Pausch et al., 2017) our results show that combining GWAS summary data from several breeds increases the power of association studies. However, for lowly heritable traits, and for distinct populations, a meta-analysis may not be as robust as the within-breed GWAS (Raven et al., 2014). For instance, within-breed GWAS showed associated SNP for fertility had no overlap between breeds probably due to small effects per locus (i.e., low power) and possibly the action of many, reasonably rare recessive lethal alleles. Holstein, with larger sample size, had associated SNP in strong LD and thus had stronger fertility predicted effects. In some instances, the meta-analysis increased the power, and consequently, the confidence interval for some fertility related QTL became visible, e.g., QTL on BTA2, 8 and 20 (Table 4). However, in other instances, the QTL detection power reduced when the QTL are private, e.g., the meta-analysis for fertility lost power compared to the within-breed GWAS when the Normande breed (with the smallest number of animals) was included in the analysis, e.g., QTL on BTA15 and 24 (Table 4).
One advantage of a meta-analysis over within-breed GWAS is that it is expected to result in more precision in QTL position due to the breakdown of long-range LD due to many generations of recombination since the breeds were separated (Raven et al., 2014; Pausch et al., 2016). In this study, we performed a within-breed GWAS first, followed by meta-analyses of GWAS summary statistics from three breeds using a weighted Z-score model to get the P-value for each SNP. In our study, the meta-analysis increased the P-value for some variants, probably because the QTL was not segregating across populations. Meta-analysis decreased P-value for some variants, and consequently, some QTL peaks became more distinct and narrower. For example, a region on BTA6:87296809 was strongly associated with protein yield and mapped near the protein-coding genes ODAM and CSN3 in a 120 kb region. Bovine odontogenic, ameloblast-associated (ODAM) participates in structuring the extracellular matrix to attach epithelial cells to mineralized surfaces thus forming a protective seal that is antagonistic to bacterial invasion (Fouillen et al., 2017). On the other hand, Casein kappa (CSN3) is associated with daily milk yield (Bonfatti et al., 2011; Bartonova et al., 2012) as well as better cheese properties (Di Gregorio et al., 2017).
Strongly associated SNP (P < 1 × 10−50) were identified in all three French dairy breeds. Among these were several SNP that had previously been described such as those close to the DGAT1-K232A mutation on BTA14 (Grisart et al., 2002). These SNP were highly associated with all milk production traits in all three breeds. A very significant sequence variant at BTA5:93948357 (rs209372883) was associated with protein and fat in Holstein and Montbeliarde; it has previously been ascribed to a variant in MGST1 (microsomal glutathione S-transferase 1; Raven et al., 2016), an upstream intron variant. MGST1 is also an inflammation response gene which is highly expressed through pregnancy and lactation (Church et al., 2012). Previous studies in Japanese Black cattle (Wang et al., 2005) suggested upregulation of MGST1 during adipocyte development in the longissimus muscle. Slightly upstream, a highly significant peak was centered within SLC15A5, a protein-coding gene. Our results suggest that both MGST1 and SLC15A5 may contain variants affecting milk production. Raven et al. (2014) identified a fat yield QTL within 3kbp of MGST1 using a multi-breed analysis and we report a QTL which was within 2kbp of the genic region of MGST1 using meta-analysis. Similar studies in Canadian Holstein, and using a similar density chip (50 k), reported a QTL 200 bp from that reported in our study (Li et al., 2010). The same Canadian Holstein study reported several fat yield candidate genes including SLC2A3 and GDF3 at 101.7–101.8 Mb and LRP6, EMP1, and DUSP16 at 97–98 Mb on BTA5 (Li et al., 2010). Their results agrees with our meta-analysis results. Another very significant QTL for milk yield in Holstein was identified on BTA20 at ~38 Mb. The lead SNP of this QTL was located near the PRLR gene which has previously been associated with fertility in Holstein cows (Leyva-Corona et al., 2018; Marete et al., 2018b). The majority of the associated SNP with fertility, a low heritability trait, were 50 k chip variants. For instance, rs110992367, the lead SNP for a 3.3 Mb QTL for the success rate of insemination in Holstein cows was 0.38 Mb from the 5′ flanking region of the G2E3 gene. Meta-analysis heterogeneity for this QTL was observed in all three breeds (P = 9.5 × 10−3), but opposite in the effect direction between Holstein and the other two breeds. The heterogeneity is important because the G2E3 gene is an essential protein-coding gene that prevents apoptotic death during embryonic development (Brooks et al., 2008). This gene has been linked to apoptosis and cellular stress in US beef cattle (Howard et al., 2014) and somatic cell score in Chinese Holstein (Wang et al., 2015). This gene is a good candidate affecting cow fertility.
All QTL identified by meta-analysis were not significant in all within-breed analysis (Tables 1–6, Figure 3C) probably due to low power to reach significance at P < 1 × 10−6 or lack of segregation in the breed (Sham and Purcell, 2014). For instance, the variant mapping to the PLAG1 gene only segregated in the Montbeliarde breed. However, most of the SNP had an effect in the same direction in all breeds for the more significant variants, e.g., BTA18:58067310 (CEACAM18 gene, fat) and BTA10:51558941 (FAM63B gene, stature). This suggests that most detected QTL segregate in all three breeds even though they were not significant in the within-breed analysis. In this study, pre-selected sequence variants were directly genotyped or imputed from a large reference population genotyped with the EuroG10k chip, and thus were known with very high accuracy. This was evident from the high average allelic correlation (r2 = 0.97) between real and imputed genotypes for the three breeds. In conclusion, many QTL segregated across the three French cattle breeds more so for highly heritable traits. The meta-analysis across the breeds using the within-breed association summary data increased the power of QTL detection. This QTL information can be incorporated in routine genomic evaluation in French populations. If an independent population of dairy cattle is genotyped with the EuroG10k chip, then the true causal variant underlying the QTL may be validated in independent population/breed. The next step of this study is to use independent Nordic cattle data to validate the QTL reported here.
AM analyzed data and wrote the manuscript. BG contributed to the design of the custom part of the EuroG10k chip, interpretation of results and revision of the manuscript. ML worked on the study design, secured project funding and revised the manuscript. SF contributed to the design of the custom part of the EuroG10k chip and the management of the marker information. GS interpreted the results and revised the manuscript. DB worked on the study design, interpretation of results and revision of the manuscript.
AM is a recipient of an Erasmus Mundus Ph.D. grant funded jointly by European Graduate School in Animal Breeding and Genetics (France), and Center for Genomic Selection in Animals and Plants (GenSAP) supported by Innovation Fund Denmark (grant 0603-00519B).
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We are grateful to French National Animal Breeding Database for providing phenotypic data, Valogene for access to genotypes and INRA for providing study environment. We also acknowledge Chris Hozé for initial genotype preparation.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018.00522/full#supplementary-material
Bartonova, P., Vrtkova, I., Kaplanova, K., and Urban, T. (2012). Association between CSN3 and BCO2 gene polymorphisms and milk performance traits in the Czech Fleckvieh cattle breed. Genet. Mol. Res. 11, 1058–1063. doi: 10.4238/2012.April.27.4
Boichard, D., Boussaha, M., Capitan, A., Rocha, D., Sanchez, M. P., et al. (2018). “Experience from large scale use of the EuroGenomics custom SNP chip in cattle,” in 11th World Congress of Genetics Applied to Livestock Production (Auckland), 675.
Boichard, D., Chung, H., Dassonneville, R., David, X., Eggen, A. A., Fritz, S. S., et al. (2012a). Design of a bovine low-density snp array optimized for imputation. PLoS ONE 7:e34130. doi: 10.1371/journal.pone.0034130
Bonfatti, V., Cecchinato, A., Di Martino, G., De Marchi, M., Gallo, L., and Carnier, P. (2011). Effect of κ-casein B relative content in bulk milk κ-casein on Montasio, Asiago, and Caciotta cheese yield using milk of similar protein composition. J. Dairy Sci. 94, 602–613. doi: 10.3168/jds.2010-3368
Bouwman, A. C., Daetwyler, H. D., Chamberlain, A. J., Hurtado Ponce, C., Sargolzaei, M., Schenkel, F. S., et al. (2018). Meta-analysis of genome wide association studies for the stature of cattle reveals common genes that regulate size in mammals. Nat. Gen. 50, 362–367. doi: 10.1038/s41588-018-0056-5
Brooks, W. S., Helton, E. S., Banerjee, S., Venable, M., Johnson, L., Schoeb, T. R., et al. (2008). G2E3 is a dual function ubiquitin ligase required for early embryonic development. J. Biol. Chem. 283, 22304–22315. doi: 10.1074/jbc.m803238200
Daetwyler, H. D., Capitan, A., Pausch, H., Stothard, P., van Binsbergen, R., Brøndum, R. F., et al. (2014). Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 46, 858–865. doi: 10.1038/ng.3034
Di Gregorio, P., Di Grigoli, A., Di Trana, A., Alabiso, M., Maniaci, G., Rando, A., et al. (2017). Effects of different genotypes at the CSN3 and LGB loci on milk and cheese-making characteristics of the bovine Cinisara breed. Int. Dairy J. 71:43105. doi: 10.1016/j.idairyj.2016.11.001
Fouillen, A., Dos Santos Neves, J., Mary, C., Castonguay, J.-D., Moffatt, P., Baron, C., et al. (2017). Interactions of AMTN, ODAM and SCPPPQ1 proteins of a specialized basal lamina that attaches epithelial cells to tooth mineral. Sci. Rep. 24:46683. doi: 10.1038/srep46683
Grisart, B., Coppieters, W., Farnir, F., Karim, L., Ford, C., Berzi, P., et al. (2002). Positional candidate cloning of a QTL in dairy cattle: identification of a missense mutation in the bovine DGAT1 gene with major effect on milk yield and composition. Genome Res. 12, 222–231. doi: 10.1101/gr.224202
Howard, J. T., Kachman, S. D., Snelling, W. M., Pollak, E. J., Ciobanu, D. C., Kuehn, L. A., et al. (2014). Beef cattle body temperature during climatic stress: a genome-wide association study. Int. J. Biometeorol. 58, 1665–1672. doi: 10.1007/s00484-013-0773-5
Leyva-Corona, J. C., Reyna-Granados, J. R., Zamorano-Algandar, R., et al. (2018). Polymorphisms within the prolactin and growth hormone/insulin-like growth factor-1 functional pathways associated with fertility traits in Holstein cows raised in a hot-humid climate. Trop. Anim. Health Prod. doi: 10.1007/s11250-018-1645-0. [Epub ahead of print].
Li, H., Wang, Z., Moore, S. S., Schenkel, F. S., and Stothard, P. (2010). :Genome-wide scan for positional and functional candidate genes affecting milk production traits in Canadian Holstein Cattle,” in Proc 9Th WCGALP, Genetics of Trait Complexes: Lactation - Lecture Sessions: 0535 (Leipzig). Available online at: http://www.wcgalp.org
Marete, A., Lund, M. S., Boichard, D., and Ramayo-Caldas, Y. (2018a). A system-based analysis of the genetic determinism of udder conformation and health phenotypes across three French dairy cattle breeds. PLoS ONE 13:e0199931. doi: 10.1371/journal.pone.0199931
Marete, A., Sahana, G., Fritz, S., Lefebvre, R., Barbat, A., Lund, M. S., et al. (2018b). Genome-wide association study for milking speed in French Holstein cows. J. Dairy Sci. 101, 6205–6219. doi: 10.3168/jds.2017-14067
Michot, P., Fritz, S., Barbat, A., Boussaha, M., Deloche, M. C., Grohs, C., et al. (2017). A missense mutation in PFAS (phosphoribosylformylglycinamidine synthase) is likely causal for embryonic lethality associated with the MH1 haplotype in Montbéliarde dairy cattle. J. Dairy Sci. 100, 8176–8187. doi: 10.3168/jds.2017-12579
Pausch, H., Emmerling, R., Grendler-Grandl, B., Fries, R., Daetwyler, D., and Goddard, M. E. (2017). Meta-analysis of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein percentages in milk at nucleotide resolution. BMC Genomics 18:853. doi: 10.1186/s12864-017-4263-8
Pausch, H., Emmerling, R., Schwarzenbacher, H., and Fries, R. (2016). A multi-trait meta-analysis with imputed sequence variants reveals twelve QTL for mammary gland morphology in Fleckvieh cattle. Genet. Sel. Evol. 48:14. doi: 10.1186/s12711-016-0190-4
Raven, L. A., Cocks, B. G., and Hayes, B. J. (2014). Multibreed genome wide association can improve precision of mapping causative variants underlying milk production in dairy cattle. BMC Genomics. 15:62. doi: 10.1186/1471-2164-15-62
Raven, L. A., Cocks, B. G., Kemper, K. E., Chamberlain, A. J., Jagt, C. J. V., Goddard, M. E., et al. (2016). Targeted imputation of sequence variants and gene expression profiling identifies twelve candidate genes associated with lactation volume, composition and calving interval in dairy cattle. Mamm. Genome 27, 81–97. doi: 10.1007/s00335-015-9613-8
van den Berg, I., Boichard, D., and Lund, M. S. (2016). Comparing power and precision of within-breed and multibreed genome-wide association studies of production traits using whole-genome sequence data for 5 French and Danish dairy cattle breeds. J. Dairy Sci. 99, 8932–8945. doi: 10.3168/jds.2016-11073
Wang, X., Ma, P., Liu, J., Zhang, Q., Zhang, Y., Ding, X., et al. (2015). Genome-wide association study in Chinese Holstein cows reveal two candidate genes for somatic cell score as an indicator for mastitis susceptibility. BMC Genet. 16:111. doi: 10.1186/s12863-015-0263-3
Wang, Y. H., Reverter, A., Mannen, H., Taniguchi, M., Harper, G. S., Oyama, K., et al. (2005). Transcriptional profiling of muscle tissue in growing Japanese Black cattle to identify genes involved with the development of intramuscular fat. Aust. J. Exp. Agric. 45, 809–820. doi: 10.1071/EA05058
Keywords: candidate genes, candidate SNP, GWAS, meta-analysis, sequence imputation
Citation: Marete AG, Guldbrandtsen B, Lund MS, Fritz S, Sahana G and Boichard D (2018) A Meta-Analysis Including Pre-selected Sequence Variants Associated With Seven Traits in Three French Dairy Cattle Populations. Front. Genet. 9:522. doi: 10.3389/fgene.2018.00522
Received: 03 June 2018; Accepted: 16 October 2018;
Published: 06 November 2018.
Edited by:Tad Stewart Sonstegard, Recombinetics, United States
Reviewed by:Dan Nonneman, Agricultural Research Service (USDA), United States
Luca Fontanesi, Università degli Studi di Bologna, Italy
Copyright © 2018 Marete, Guldbrandtsen, Lund, Fritz, Sahana and Boichard. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Andrew G. Marete, email@example.com