A Meta-Analysis Including Pre-selected Sequence Variants Associated With Seven Traits in Three French Dairy Cattle Populations

A within-breed genome-wide association study (GWAS) is useful when identifying the QTL that segregates in a breed. However, an across-breed meta-analysis can be used to increase the power of identification and precise localization of QTL that segregate in multiple breeds. Precise localization will allow including QTL information from other breeds in genomic prediction due to the persistence of the linkage phase between the causal variant and the marker. This study aimed to identify and confirm QTL detected in within-breed GWAS through a meta-analysis in three French dairy cattle breeds. A set of sequence variants selected based on their functional annotations were imputed into 50 k genotypes for 46,732 Holstein, 20,096 Montbeliarde, and 11,944 Normande cows to identify QTL for milk production, the success rate at insemination of cows (fertility) and stature. We conducted within-breed GWAS followed by across-breed meta-analysis using a weighted Z-scores model on the GWAS summary data (i.e., P-values, effect direction, and sample size). After Bonferroni correction, the GWAS result identified 21,956 significantly associated SNP (PFWER < 0.05), while meta-analysis result identified 9,604 significant SNP (PFWER < 0.05) associated with the phenotypes. The meta-analysis identified 36 QTL for milk yield, 48 QTL for fat yield and percentage, 29 QTL for protein yield and percentage, 13 QTL for fertility, and 16 QTL for stature. Some of these QTL were not significant in the within-breed GWAS. Some previously identified causal variants were confirmed, e.g., BTA14:1802265 (fat percentage, P = 1.5 × 10−760; protein percentage, P = 7.61 × 10−348) both mapping the DGAT1-K232A mutation and BTA14:25006125 (P = 8.58 × 10−140) mapping PLAG1 gene was confirmed for stature in Montbeliarde. New QTL lead SNP shared between breeds included the intronic variant rs109205829 (NFIB gene), and the intergenic variant rs41592357 (1.38 Mb upstream of the CNTN6 gene and 0.65 Mb downstream of the CNTN4 gene). Rs110425867 (ZFAT gene) was the top variant associated with fertility, and new QTL lead SNP included rs109483390 (0.1 Mb upstream of the TNFAIP3 gene and 0.07 Mb downstream of PERP gene), and rs42412333 (0.45 Mb downstream of the RPL10L gene). An across-breed meta-analysis had greater power to detect QTL as opposed to a within breed GWAS. The QTL detected here can be incorporated in routine genomic predictions.

A within-breed genome-wide association study (GWAS) is useful when identifying the QTL that segregates in a breed. However, an across-breed meta-analysis can be used to increase the power of identification and precise localization of QTL that segregate in multiple breeds. Precise localization will allow including QTL information from other breeds in genomic prediction due to the persistence of the linkage phase between the causal variant and the marker. This study aimed to identify and confirm QTL detected in within-breed GWAS through a meta-analysis in three French dairy cattle breeds. A set of sequence variants selected based on their functional annotations were imputed into 50 k genotypes for 46,732 Holstein, 20,096 Montbeliarde, and 11,944 Normande cows to identify QTL for milk production, the success rate at insemination of cows (fertility) and stature. We conducted within-breed GWAS followed by across-breed meta-analysis using a weighted Z-scores model on the GWAS summary data (i.e., P-values, effect direction, and sample size). After Bonferroni correction, the GWAS result identified 21,956 significantly associated SNP (P FWER < 0.05), while meta-analysis result identified 9,604 significant SNP (P FWER < 0.05) associated with the phenotypes. The meta-analysis identified 36 QTL for milk yield, 48 QTL for fat yield and percentage, 29 QTL for protein yield and percentage, 13 QTL for fertility, and 16 QTL for stature. Some of these QTL were not significant in the within-breed GWAS. Some previously identified causal variants were confirmed, e.g., BTA14:1802265 (fat percentage, P = 1.5 × 10 −760 ; protein percentage, P = 7.61 × 10 −348 ) both mapping the DGAT1-K232A mutation and BTA14:25006125 (P = 8.58 × 10 −140 ) mapping PLAG1 gene was confirmed for stature in Montbeliarde. New QTL lead SNP shared between breeds included the intronic variant rs109205829 (NFIB gene), and the intergenic variant rs41592357 (1.38 Mb upstream of the CNTN6 gene and 0.65 Mb downstream of the CNTN4 gene). Rs110425867 (ZFAT gene) was the top variant associated with fertility, and new QTL lead SNP included rs109483390 (0.1 Mb upstream of the TNFAIP3 gene and 0.07 Mb downstream of PERP gene), and rs42412333 (0.45 Mb downstream of the RPL10L gene). An across-breed meta-analysis had greater power to detect QTL as opposed to a within breed GWAS. The QTL detected here can be incorporated in routine genomic predictions.

INTRODUCTION
Ideally, the magnitude of estimated effects on the quantitative trait of interest could be used to rank single nucleotide polymorphisms (SNP) for a functional genomic study to identify causal variants. The physical locations of the associated SNP on the genome are then flagged and mined for causative variants underlying quantitative trait loci (QTL). Causal variants reported for QTL in previous dairy cattle studies include polymorphisms causing variation in milk production, fertility, and embryo mortality (Grisart et al., 2002;Hoff et al., 2017;Michot et al., 2017;Bouwman et al., 2018). Most analyses testing genomic regions, like those above, perform within-breed GWAS (Chang et al., 2018). Even so, our knowledge regarding causal variants in these genomic regions remains limited because the GWAS analyses yield similar P-values for many adjacent SNP variants, a consequence of linkage disequilibrium (LD). Even with a multitude of GWAS results, strong LD prevents distinguishing the actual causal variant from linked markers (Goddard and Hayes, 2009). A meta-analysis can be used to improve the resolution of QTL detection and identify causal variants provided that LD is conserved at short distances across breeds (van den Berg et al., 2016). The main advantage of a meta-analysis is that it allows simultaneous analysis of many breeds by combining GWAS summary statistics across populations, thereby increasing power to detect QTL (van den Berg et al., 2016;Bouwman et al., 2018).
In this paper, we report an across-breed meta-analysis of GWAS summary statistics based on imputed pre-selected sequence variant genotypes from 78,772 cattle from three dairy breeds and for seven traits. The meta-analysis across breeds allowed us to identify 142 QTL for milk production, stature, and fertility.

Studied Population and Phenotypes
We studied phenotypes from three French dairy cattle breeds. No ethical approval was necessary since the data used was from existing databases. Phenotypes were from cows born between 2007 and 2013 in the French dairy production regions and were defined as follows: (1) Production traits including milk, protein, and fat yields were obtained from test-day records expressed as 305 d yield (kg). Fat content (g/kg) was calculated as 1,000 * (fat yield/milk yield). Protein content (g/kg) was calculated as 1,000 * (protein yield/milk yield).
(2) Fertility was defined as success/failure at each insemination of lactating cows and recorded as a success (=1) or a failure (=0). (3) Stature was measured as the vertical distance from the plank to the sacrum.
These traits were analyzed using the French national evaluation model (Boichard et al., 2012b) to obtain Yield Deviations (YD) (VanRaden and Wiggans, 1991). A YD can be defined as the mean performance adjusted for all environmental effects, including the permanent environment for cows with multiple records. The model varied according to the trait, especially concerning environmental factors. We retained traits for genotyped cows and this included data from 46,732 Holstein cows, 20,096 Montbeliarde cows, and 11,944 Normande cows.
(2) Whole genome sequence variants-a custom part of up to 7,232 SNP selected from sequence data as part of 1,000 Bull Genomes Project Run 4 (Daetwyler et al., 2014) based on their functional annotation. The full description of the EuroG10k chip and quality control procedure was previously described by Marete et al. (2018a) and Boichard et al. (2018). Cows with phenotypes were rather old animals and were genotyped with the 50 k chip. Younger animals including cows without phenotypes were genotyped with the EuroG10k chip. To get complete marker information across animals, and to include the sequence variants in the analysis, we run two successive steps of imputation for the entire populations (males and females, with and without phenotypes, see Figure 1). First, we imputed all 50 k markers for all animals using animals with 50 k genotypes as a reference. Then we imputed sequence variants using the animals genotyped with the EuroG10k chip as the reference population. The imputation was done within breed using the FImpute software (Sargolzaei et al., 2014). After imputation, 48,576 SNP distributed across 29 Bos taurus autosomes (BTA) remained. Of these, 42,967 were from the 50 k chip, and 5,609 sequence variants selected from French cattle populations. Imputation accuracy was estimated using both concordance rate and allelic squared correlation (r 2 ), both for animals genotyped with 50 k only and those genotyped with EuroG10k chip only. In both instances, we randomly masked 20% of the genotyped markers on each chromosome and 15% of the cows per breed, then performed an imputation and compared the imputed genotypes with true genotypes by estimating a Pearson correlation coefficient. For the 50 k SNP, the average concordance was >0.98, and average allelic r 2 was >0.97, and for the sequence variants, the average concordance was >0.95, and average r 2 was >0.96 for all breeds.
Association Studies, Meta-Analysis, and QTL Heterogeneity First, 48,576 SNP (pre-selected sequence variants and 50 k SNP) were tested individually for association with each trait within the breed. The GCTA software (Yang et al., 2011) was used to fit a mixed linear regression model to test associations between a SNP and the trait. For any given trait, the fitted model was where y was a vector of phenotypes (yield deviation) for all cows, 1 was a row vector of 1s, µ was the mean, x was the vector of true or imputed allele dosages for the SNP, β was the FIGURE 1 | Genotype imputation workflow. fixed allele substitution effect of the SNP, g was the vector of random additive genetic effects with g ∼ N(0, Gσ 2 g ), G being the genomic relationship matrix, σ 2 g was the variance explained by all the SNPs, ε was a vector of random residual effects with ε ∼ N(0, Iσ 2 ε ), σ 2 e was the error variance, and I was an identity matrix. The variance of y was var y = Gσ 2 g + Iσ 2 ε . The g jk term of G matrix was estimated by where w was the total number of SNP, z ij and z ik were numbers of copies of the reference allele for the i th SNP in the j th and k th cows, respectively, and p i was the frequency of the reference allele estimated from the marker data (VanRaden, 2008).
To control the type I error rate and assuming all tests were independent (a conservative assumption due to LD among SNP), a Bonferroni genome-wide correction was applied for the total number of tests for each trait i.e., the nominal type I error rate (α = 5%) was divided by the number of SNP (w = 48,576) to obtain a genome-wide error rate threshold of t = 1.03 × 10 −6 . Any SNP whose probability of observing the test statistics was below this value was considered genome-wide significant.
The within-breed analyses were followed by a meta-analysis using weighted Z-score model as implemented in METAL software (Willer et al., 2010). The weighted Z-score model used P-values and directions of effect estimates and weights individual GWAS based on the sample size to compute a Z-score, i.e., where Z b was the Z-score for breed b, was the standard normal cumulative distribution, P b was the P-value, b was the direction of the SNP effect estimated within breed b. Overall Z-score was calculated as: where w b = √ N b and N b was the sample size for breed b. The overall P-value was calculated as: We evaluated heterogeneity of the effect sizes across breeds using Cochran's Q-test (Cochran, 1954) as implemented in the METAL software. A QTL was deemed to segregate across populations if the heterogeneity P-value was ≤ 0.05. The functional consequence of significantly associated variants was predicted using the Variant Effect Predictor tool from Ensembl Genome Browser 90 (McLaren et al., 2016). We classified variants that (i) mapped to a gene, (ii) <5-kb to known genes, and, (iii) >5-kb from any coding region. In case of multiple variants representing the same gene, we kept the variant associated with most traits and with the lowest P-value as representative for that gene.

Identification of QTL That Segregates Within and Across Breeds
Since it is challenging to identify the causative variant, we aimed at determining the QTL region that most likely harbor the causal variant. First, for each chromosome, we subjectively inspected significantly associated variants within the genomic regions. Secondly, we defined the within-breed QTL region as 1 Mb windows and the SNP with the lowest P-value designated as the lead SNP. A QTL was selected as significant in more than one breed if any of the SNP in the QTL region of the other breed had a P-value lower than the genome-wide threshold (P < 1.03 × 10 −6 ). The UMD3.1 (Zimin et al., 2009) bovine genome annotation was used to annotate genes centering within 1 Mb intervals on the lead SNP. We compared these annotations to known QTL for bovine milk production, fertility and stature traits using QTLdb (Hu et al., 2016) and literature review.

Association Studies
Overall, there were 21,956 SNP significantly associated with traits in three French cattle breeds. Forty-three percent of these associated SNP were from Holstein, 24% from Montbeliarde, 14% from Normande. Seven percent of these SNP showed association in both Holstein and Montbeliarde, 4% in Holstein and Normande, 5% in Montbeliarde and Holstein, and 3% in all three breeds. Production traits and stature had more significant hits across breeds, compared to fertility. As presented in Figures 2A-E, the overlap of significant tests in any two-breed combination and for all traits was most evident between Holstein and Montbeliarde (1,621 SNP) and least evident between Montbeliarde and Normande (1,082 SNP). Seven hundred and one SNP were significantly associated with the same trait in all three breeds.
As presented in Figures 2F-I, some of the imputed sequence variants had lower P-values compared with the 50 k variants P-values. Although more than 13% of the pre-selected sequence variants had MAF between 0.5 and 5%, only 2 QTL lead SNP had MAF lower than 5%. We did not detect QTL with MAF <5% in Normande. Among the pre-selected sequence variants, 1,478 were significant (P < 1.03 × 10 −6 ).
A QTL on BTA8 at 29-30 Mb with the same lead SNP at BTA8:30088133 (rs109205829) was observed in both Holstein and Montbeliarde. This lead SNP is an intronic variant within the NFIB gene, and the meta-analysis had a lower P-value (P = 2.3 × 10 −22 ) than either of the single breed GWAS (P = 8.4 × 10 −14 ). Another QTL on BTA22 at 23-26 Mb with a lead SNP at BTA22:24219999 (rs41592357) was observed in all three breeds. Rs41592357 is an intergenic variant 1.38 Mb upstream of the CNTN6 gene and 0.65 Mb downstream of the CNTN4 gene. Each dot shows the most significant P-value that was observed across both traits. Light green color represents sequence variants with P <1 × 10 −6 .
Frontiers in Genetics | www.frontiersin.org Populations in order are MON, NOR, and HOL; "+" and "-" denote positive and negative substitution effects of the alternate allele. "?" indicates that the variant did not segregate in the respective population. The P-value of Cochran's Q-test for heterogeneity of the effect sizes across breeds is given in parentheses, and is significant if P < 0.05.
Frontiers in Genetics | www.frontiersin.org Populations in order are MON, NOR, and HOL; "+" and "-" denote positive and negative substitution effects of the alternate allele. "?" indicates that the variant did not segregate in the respective population. The P-value of Cochran's Q-test for heterogeneity of the effect sizes across breeds is given in parentheses, and is significant if P < 0.05.
Frontiers in Genetics | www.frontiersin.org  Populations in order are MON, NOR, and HOL; "+" and "-" denote positive and negative substitution effects of the alternate allele. "?" indicates that the variant did not segregate in the respective population. The P-value of Cochran's Q-test for heterogeneity of the effect sizes across breeds is given in parentheses, and is significant if P < 0.05.

Meta-Analysis Heterogeneity Among Three Breeds
As an example of the heterogeneity observed from a metaanalysis of the three breeds, we observed peaks from a combined Manhattan plot for fat and protein percentage (Figures 3A,B) and standardized the allelic substitution effects by the phenotypic standard deviation of fat and protein percentage meta-analysis results. We observed heterogeneity (Cochran's Q, P < 0.05) for 36 QTL from the meta-analysis in three breeds ( Figure 3D). We then overlay observed breed specific QTL for these two traits and Holstein had 26 QTL in 24 chromosomes, Montbeliarde had 17 QTL in 14 chromosomes, and Normande had 16 QTL in 7 chromosomes ( Figure 3C).

DISCUSSION
Our meta-analysis of association summary statistics for seven traits across three French dairy cattle breeds discovered 120 QTL including 13 QTL that had not been detected at P < 1.03 × 10 −6 in the within-breed analyses. In agreement with previous studies (Pausch et al., 2017) our results show that combining GWAS summary data from several breeds increases the power of association studies. However, for lowly heritable traits, and for distinct populations, a meta-analysis may not be as robust as the within-breed GWAS (Raven et al., 2014). For instance, within-breed GWAS showed associated SNP for fertility had no overlap between breeds probably due to small effects per locus (i.e., low power) and possibly the action of many, reasonably rare recessive lethal alleles. Holstein, with larger sample size, had associated SNP in strong LD and thus had stronger fertility predicted effects. In some instances, the meta-analysis increased the power, and consequently, the confidence interval for some fertility related QTL became visible, e.g., QTL on BTA2, 8 and 20 (Table 4). However, in other instances, the QTL detection power reduced when the QTL are private, e.g., the meta-analysis for fertility lost power compared to the within-breed GWAS when the Normande breed (with the smallest number of animals) was included in the analysis, e.g., QTL on BTA15 and 24 (Table 4).
One advantage of a meta-analysis over within-breed GWAS is that it is expected to result in more precision in QTL position due to the breakdown of long-range LD due to many generations of recombination since the breeds were separated (Raven et al., 2014;Pausch et al., 2016). In this study, we performed a withinbreed GWAS first, followed by meta-analyses of GWAS summary statistics from three breeds using a weighted Z-score model to get the P-value for each SNP. In our study, the meta-analysis increased the P-value for some variants, probably because the QTL was not segregating across populations. Meta-analysis decreased P-value for some variants, and consequently, some QTL peaks became more distinct and narrower. For example, a region on BTA6:87296809 was strongly associated with protein yield and mapped near the protein-coding genes ODAM and CSN3 in a 120 kb region. Bovine odontogenic, ameloblastassociated (ODAM) participates in structuring the extracellular matrix to attach epithelial cells to mineralized surfaces thus forming a protective seal that is antagonistic to bacterial invasion (Fouillen et al., 2017). On the other hand, Casein kappa (CSN3) is associated with daily milk yield (Bonfatti et al., 2011;Bartonova et al., 2012) as well as better cheese properties (Di Gregorio et al., 2017).
Strongly associated SNP (P < 1 × 10 −50 ) were identified in all three French dairy breeds. Among these were several SNP that had previously been described such as those close to the DGAT1-K232A mutation on BTA14 (Grisart et al., 2002). These SNP were highly associated with all milk production traits in all three breeds. A very significant sequence variant at BTA5:93948357 (rs209372883) was associated with protein and fat in Holstein and Montbeliarde; it has previously been ascribed to a variant in MGST1 (microsomal glutathione S-transferase 1; Raven et al., 2016), an upstream intron variant. MGST1 is also an inflammation response gene which is highly expressed through pregnancy and lactation (Church et al., 2012). Previous studies in Japanese Black cattle (Wang et al., 2005) suggested upregulation of MGST1 during adipocyte development in the longissimus muscle. Slightly upstream, a highly significant peak was centered within SLC15A5, a protein-coding gene. Our results suggest that both MGST1 and SLC15A5 may contain variants affecting milk production. Raven et al. (2014) identified a fat yield QTL within 3kbp of MGST1 using a multi-breed analysis and we report a QTL which was within 2kbp of the genic region of MGST1 using metaanalysis. Similar studies in Canadian Holstein, and using a similar density chip (50 k), reported a QTL 200 bp from that reported in our study . The same Canadian Holstein study reported several fat yield candidate genes including SLC2A3 and GDF3 at 101.7-101.8 Mb and LRP6, EMP1, and DUSP16 at 97-98 Mb on BTA5 . Their results agrees with our meta-analysis results. Another very significant QTL for milk yield in Holstein was identified on BTA20 at ∼38 Mb. The lead SNP of this QTL was located near the PRLR gene which has previously been associated with fertility in Holstein cows (Leyva-Corona et al., 2018;Marete et al., 2018b). The majority of the associated SNP with fertility, a low heritability trait, were 50 k chip variants. For instance, rs110992367, the lead SNP for a 3.3 Mb QTL for the success rate of insemination in Holstein cows was 0.38 Mb from the 5 ′ flanking region of the G2E3 gene. Meta-analysis heterogeneity for this QTL was observed in all three breeds (P = 9.5 × 10 −3 ), but opposite in the effect direction between Holstein and the other two breeds. The heterogeneity is important because the G2E3 gene is an essential protein-coding gene that prevents apoptotic death during embryonic development (Brooks et al., 2008). This gene has been linked to apoptosis and cellular stress in US beef cattle (Howard et al., 2014) and somatic cell score in Chinese Holstein (Wang et al., 2015). This gene is a good candidate affecting cow fertility.
All QTL identified by meta-analysis were not significant in all within-breed analysis (Tables 1-6, Figure 3C) probably due to low power to reach significance at P < 1 × 10 −6 or lack of segregation in the breed (Sham and Purcell, 2014).
For instance, the variant mapping to the PLAG1 gene only segregated in the Montbeliarde breed. However, most of the SNP had an effect in the same direction in all breeds for the more significant variants, e.g., BTA18:58067310 (CEACAM18 gene, fat) and BTA10:51558941 (FAM63B gene, stature). This suggests that most detected QTL segregate in all three breeds even though they were not significant in the within-breed analysis. In this study, pre-selected sequence variants were directly genotyped or imputed from a large reference population genotyped with the EuroG10k chip, and thus were known with very high accuracy. This was evident from the high average allelic correlation (r 2 = 0.97) between real and imputed genotypes for the three breeds. In conclusion, many QTL segregated across the three French cattle breeds more so for highly heritable traits. The metaanalysis across the breeds using the within-breed association summary data increased the power of QTL detection. This QTL information can be incorporated in routine genomic evaluation in French populations. If an independent population of dairy cattle is genotyped with the EuroG10k chip, then the true causal variant underlying the QTL may be validated in independent population/breed. The next step of this study is to use independent Nordic cattle data to validate the QTL reported here.

AUTHOR CONTRIBUTIONS
AM analyzed data and wrote the manuscript. BG contributed to the design of the custom part of the EuroG10k chip, interpretation of results and revision of the manuscript. ML worked on the study design, secured project funding and revised the manuscript. SF contributed to the design of the custom part of the EuroG10k chip and the management of the marker information. GS interpreted the results and revised the manuscript. DB worked on the study design, interpretation of results and revision of the manuscript.

FUNDING
AM is a recipient of an Erasmus Mundus Ph.D. grant funded jointly by European Graduate School in Animal Breeding and Genetics (France), and Center for Genomic Selection in Animals and Plants (GenSAP) supported by Innovation Fund Denmark (grant 0603-00519B).