Identification of Whole-Genome Significant Single Nucleotide Polymorphisms in Candidate Genes Associated With Serum Biochemical Traits in Chinese Holstein Cattle

A genome-wide association study (GWAS) was conducted on 23 serum biochemical traits in Chinese Holstein cattle. The experimental population consisted of 399 cattle, each genotyped by a commercial bovine 50K SNP chip, which had 49,663 SNPs. After data cleaning, 41,092 SNPs from 361 Holstein cattle were retained for GWAS. The phenotypes were measured values of serum measurements of these animals that were taken at 11 days after parturition. Two statistical models, a fixed-effect linear regression model (FLM) and a mixed-effect linear model (MLM), were used to estimate the association effects of SNPs. Genome-wide significant and suggestive thresholds were set up to be 1.22E−06 and 2.43E−06, respectively. In the Chinese Holstein population, FLM identified 81 genome-wide significant (0.05/41,092 = 1.22E−06) SNPs associated with 11 serum traits. Among these SNPs, five SNPs (BovineHD0100005950, ARS-BFGL-NGS-115158, BovineHD1500021175, BovineHD0800028900, and BTB-00442438) were also identified by the MLM to have genome-wide suggestive effects on CHE, DBIL, and LDL. Both statistical models pinpointed two SNPs that had significant effects on the Holstein population. The SNP BovineHD0800028900 (located near the gene LOC101903458 on chromosome 8) was identified to be significantly associated with serum high- and low-density lipoprotein (HDL and LDL), whereas BovineHD1500021175 (located in 73.4Mb on chromosome 15) was an SNP significantly associated with total bilirubin and direct bilirubin (TBIL and DBIL). Further analyses are needed to identify the causal mutations affecting serum traits and to investigate the correlation of effects for loci associated with fatty liver disease in dairy cattle.

All animals were genotyped with a bovine 50K SNP chip (49,663 SNPs). SNPs from the X chromosome were counted due to the overall majority of female individuals in the study population. After the data quality control procedure (Yue et al., 2017;Yan et al., 2019), 361 animals with 41,092 SNP genotypes were finally retained for the subsequent GWAS analysis. Physical map length, the number of SNPs, and the SNP density on each chromosome, before and after the data cleaning procedure, are shown in Supplementary Table S2. A pair-wise linkage disequilibrium (LD) analysis was conducted for the Holstein population. The results showed high genome-wide similarity of LD patterns among the cattle populations (Supplementary Figure S1). The similarity might reflect the sharing of breeding histories among the cattle. Multidimensional scaling (MDS) analysis of 12,380 independent SNP markers (Purcell et al., 2007;Yue et al., 2017;Yan et al., 2019) with r 2 < 0.2 (Wang et al., 2009), using the first and the second components, indicating that there was slight population stratification (Supplementary Figure S2). To better correct cryptic population stratification, the first MDS component was used to be the covariate in the following genome-wide association analysis (Supplementary Figure S3).
According to the previous method (Yue et al., 2017), a GWAS analysis was carried out by two statistical models, a fixed-effect linear model (FLM) and a mixed-effect linear model (MLM), implemented by the PLINK software package V1.07 (Purcell et al., 2007) and the GCTA (v1.2.4) software package (Yang et al., 2011), respectively. FLM is of the form: where y is a vector of phenotypic values; α is a vector of fixed effects including the population mean and the first MDS component; W is the designed matrix for fixed effects; β is the marker effect; x a vector of marker genotypes; and e is the random errors with distribution of N(0, Iσ 2 e ). Here, σ 2 e is the residual variances. For MLM, an additive genomic relatedness matrix is included to control the type I error, which is of the form where Z is the designed matrix, and u is the vector of random effects with the distribution of N(0, Kσ 2 a ). Here, σ 2 a is the additive genetic variances and K is the additive genomic relatedness matrix. The other symbols are the same as the FLM. Bonferroni corrections for the genome-wide significance and suggestive thresholds (Mapholi et al., 2016;Kerr et al., 2017) were computed to be 1.22E−06 (=0.05/41,092) and 2.43E−06 (=0.1/41,092), respectively.
A GWAS based on the FLM identified 81 SNPs with genomewide significant (1.22E−06) association effects on 11 serum traits (Table 1) in the Holstein cattle population. A GWAS based on the MLM identified 15 SNPs as having genomewide suggestive effects on 11 serum traits ( Table 2). Among these SNPs, five SNPs (BovineHD0100005950, ARS-BFGL-NGS-115158, BovineHD1500021175, BovineHD0800028900 and BTB-00442438) were identified by both the FLM and MLM to have genome-wide suggestive effects on CHE, DBIL, and LDL.
The SNPs identified through the MLM displayed lower overlapping than those identified through the FLM. However, the set of significant SNPs from the MLM in the study was almost a subset of SNPs from the FLM. The SNPs identified through the MLM were more conservative because the MLM took into account the additive genetic effects of each animal, and the false positive rate was expected to be lower than with the FLM. In the GWAS, the FLM with the population structure fitted as covariates may not control the type I error well, while the MLM can lead to false negatives, thus missing some potentially important discoveries (Liu et al., 2016;Supplementary Figure S3). The FLM and MLM are the most popular models in the field of GWAS (Yu et al., 2006;Purcell et al., 2007;Kang et al., 2008Kang et al., , 2010. On the other hand, the low overlapping genome-wide significant SNPs identified from the FLM and MLM also suggest low heritability (h 2 ) of biochemical serum traits, which could be genetically affected by minor genes.
Interestingly, both statistical models pinpointed two SNPs (BovineHD0800028900 and BovineHD1500021175) that displayed genome-wide significant (1.22E−06) association effects on serum traits in the Holstein population. The SNP BovineHD0800028900, located at the downstream of LOC101903458 gene on chromosome 8, was identified to be significantly associated with serum high-and low-density lipoprotein (HDL and LDL). The SNP of BovineHD1500021175 on chromosome 15 was found to have significant association effects on serum bilirubin (TBIL and DBIL). Further analyses are needed to understand the mechanism for the association effects of these SNPs on serum biochemical traits (Du et al., 2013;Hu et al., 2015).
Additionally, several candidate genes or DNA regions that we found to be significantly associated with serum biochemical traits in Holstein cattle coincided with reported association effects on other traits in the literature. For example, six SNPs at the DNA region from 113.6 to 113.7 cM of chromosome 5, closely associated with TCF20 gene, were identified to have a significant effect on the serum ALP level ( Table 1). The same DNA region was reported to have a QTL associated with blood triglyceride (TAG) levels (Wu et al., 2014). As another example, Hapmap51041-BTA-72970, located at the downstream  region of EEA1 (early endosome antigen 1), was identified to be significantly associated with serum low-density lipoprotein (LDL) level in both Holstein and Jersey cattle in the study. The same region was found to be a QTL, having an effect on abomasum displacement in German Holstein cattle (Mömke et al., 2013). MNTR1A (melatonin receptor 1A) was previously found associated with intramuscular fat and subcutaneous fat (Yang et al., 2015) in Qinchuan beef cattle, and it was also found to be a candidate gene of serum LDL in our study. In summary, GWAS was conducted using two statistical models on 23 serum biochemical traits in a Chinese Holstein cattle population. Eighty-one genome-wide significant (1.22E−06) SNPs were identified to have association effects on 11 serum biochemical traits through FLM. Among these SNPs, five SNPs were also identified by the MLM to have genome-wide suggestive effects on CHE, DBIL, and LDL. There were two SNPs, BovineHD0800028900 and BovineHD1500021175, that were found to be associated with multiple serum lipoprotein levels and serum bilirubin traits, respectively. The role of these identified SNPs associated with serum biochemical traits remains to be further investigated and validated in future studies. Understand their roles may increase our understanding of the underlying molecular biology of perinatal metabolic disorder, such as fatty liver disease, in dairy cows.

ETHICS STATEMENT
All experiments were carried out according to the Regulations for the Administration of Affairs Concerning Experimental Animals published by the Ministry of Science and Technology, China