Genomic Loci Affecting Milk Production in German Black Pied Cattle (DSN)

German Black Pied cattle (DSN) is an endangered population of about 2,550 dual-purpose cattle in Germany. Having a milk yield of about 2,500 kg less than the predominant dairy breed Holstein, the preservation of DSN is supported by the German government and the EU. The identification of the genomic loci affecting milk production in DSN can provide a basis for selection decisions for genetic improvement of DSN in order to increase market chances through the improvement of milk yield. A genome-wide association analysis of 30 milk traits was conducted in different lactation periods and numbers. Association using multiple linear regression models in R was performed on 1,490 DSN cattle genotyped with BovineSNP50 SNP-chip. 41 significant and 20 suggestive SNPs affecting milk production traits in DSN were identified, as well as 15 additional SNPs for protein content which are less reliable due to high inflation. The most significant effects on milk yield in DSN were detected on chromosomes 1, 6, and 20. The region on chromosome 6 was located nearby the casein gene cluster and the corresponding haplotype overlapped the CSN3 gene (casein kappa). Associations for fat and protein yield and content were also detected. High correlation between traits of the same lactation period or number led to some SNPs being significant for multiple investigated traits. Half of all identified SNPs have been reported in other studies, previously. 15 SNPs were associated with the same traits in other breeds. The other associated SNPs have been reported previously for traits such as exterior, health, meat and carcass, production, and reproduction traits. No association could be detected between DGAT1 and other known milk genes with milk production traits despite the close relationship between DSN and Holstein. The results of this study confirmed that many SNPs identified in other breeds as associated with milk traits also affect milk traits in dual-purpose DSN cattle and can be used for further genetic analysis to identify genes and causal variants that affect milk production in DSN cattle.


INTRODUCTION
German Black Pied cattle (DSN, "Deutsches Schwarzbuntes Niederungsrind") is an endangered breed of about 2,550 dualpurpose cattle in Germany. The ancestor population of this breed is considered as one of the founder populations of the nowadays dominantly used high yielding German Holstein breed (Köppe-Forsthoff, 1967;Grothe, 1993), which is one reason why the German government and the European Union support its preservation. This support is necessary because milk yield of DSN cows is about 2,500 kg less compared to German Holstein, which has led to the replacement of DSN by Holstein cattle. Although DSN is a dual-purpose breed, milk yield is the main contributor to the economic merit and meat yield and carcass quality do not compensate the lower milk yield.
The interest in preservation of DSN does not only stem from its relationship to Holstein cattle and retaining genetic diversity as a gene reserve for the future. The interest in DSN cattle is also due to their advantageous traits. For example, the milk fat and protein content with 4.3 % and 3.7 %, respectively, is higher compared to German Holstein (RBB Rinderproduktion Berlin-Brandenburg GmbH., 2020). Moreover, DSN cattle are considered to be more robust for grazing and more fertile.
One of the long-term goals for maintaining DSN is to reduce governmental financial support by increasing economic value of the breed through the improvement of milk yield. Simultaneously, the advantageous traits and the typical body composition of this dual-purpose breed should be maintained.
So far, little is known about the genes affecting milk traits in DSN. Recently, genome-wide association studies (GWAS) in DSN for health traits identified three significant and two suggestive SNPs for clinical mastitis (Meier et al., 2020) and 44 significant SNPs for endoparasite resistance (May et al., 2019). Another study, investigating genomic variation in the casein gene cluster, found three protein variants of CSN2 and CSN3 and fixed protein variants for CSN1S1 and CSN1S2 (Meier et al., 2019). In contrast, 63,404 associations with milk traits in general were available from Cattle QTLdb Release 42 (accessed 09/21/2020) (Hu et al., 2019), whereof around 79 % were found in studies with Holstein cattle. Furthermore, 10 % of the reported associations in those studies with Holstein cattle are reported within the first 10 Mb on chromosome 14, where the DGAT1 gene is located which is known to influence milk yield and composition (Grisart et al., 2002).
In this study, we investigated the genetic basis contributing to the variation in milk performance of the current DSN population by GWAS. Since the DSN population is small, power to find significant genomic loci is limited. Nevertheless, there is an urgent need to provide genetic association information for small endangered populations to support their preservation and further development. Even if not all genomic loci are significantly detectable, the obtained results in DSN and the comparison to related breeds provide a basis for selection decisions to genetically improve DSN.

Populations
Data of 1,816 DSN cows was available for this study. These cows represent about two thirds of the DSN population registered in Germany (The Society for the Conservation of Old and Endangered Livestock Breeds (GEH), 2018). The cows were born between 2005 and 2016, and descended from 76 sires. Cows were raised on six farms. In order to reduce environmental influences, we filtered to have at least 20 DSN cows per farm, per sire, and per birth year. This reduced the data set to 1,490 DSN cows from five farms, born between 2007 and 2016, and descending from 28 sires.
In order to compare the results that we obtained in DSN with German Holstein, we used GWAS results previously obtained in our lab on a population of 2,400 German Holstein bulls. This population has been used and described in detail repeatedly (Zielke et al., 2011(Zielke et al., , 2013Abdel-Shafy et al., 2018).

Phenotypes
Milk traits with corresponding pedigree data were obtained from the cattle breeding association "RBB Rinderproduktion Berlin-Brandenburg GmbH" in April 2020. Traits included milk, fat, and protein yield in kilogram (milk kg, fat kg, and protein kg) for three lactation periods: 100-days (100d), 200days (200d), and 305-days (305d). 305 days data was available for the first three lactations (LA1, LA2, and LA3), whereas 100 and 200 days data was available only for LA1. 305 days data of cows with < 270 days in milk was not considered. Fat and protein content (fat %, protein %) were calculated by dividing fat or protein kg by milk kg of the respective lactation. The lactation mean (LAm) was calculated for cows with full data in the first three lactations. This leads to a total of 30 investigated milk traits in this study. For each trait, outliers were defined as values outside the 1.5 times interquartile range within each farm and removed from the data set. This leads to data being available for 1, 478,1,476,1,372,1,160,862,and 685 DSN cows in LA1 (100d), LA1 (200d), LA1, LA2, LA3, and LAm, respectively.

Genotypes
DSN cattle cows were genotyped using the Illumina R Bovine50SNP v3 BeadChip (Illumina, Inc., 5200 Illumina Way, San Diego, CA, United States). SNP chip probe sequences were remapped against the Bos taurus genome version ARS-UCD1.2 (Rosen et al., 2018) using NCBI Nucleotide-Nucleotide BLAST version 2.2.31+ (Altschul et al., 1990) in order to obtain genome positions for SNPs on the ARS-UCD1.2 genome build. SNP probes that mapped to multiple genomic locations were removed. Genotype quality control was performed for animals and SNPs. SNP calls with a GC-score < 0.7 were set to missing. Animals with a call rate < 90% were discarded. SNPs with a call rate < 95% and a minor allele frequency (MAF) < 5% were removed. Lastly, genotype groups with less than 30 observations were set to missing to prevent spurious association. After quality control, 36,929 high confident SNPs were available for further analysis.

Genome-Wide Association Study
Genome-wide association studies was performed with multiple linear regression models implemented in the R language for statistical computing (version 4.0.3) (R Core Team, 2018). Various models were tested and compared by calculating the inflation factor λ of each model to judge the extent of the excess of type-I errors (Devlin and Roeder, 1999). Finally, the model with the lowest λ was selected (Supplementary Table 1). On average 27 % of top 100 SNPs of the selected model are shared with each of the other models tested (Supplementary Figure 1). The resulting model for testing the additive effect of each SNP was: where Y ijklmnopq is the trait, ps i represents the covariate for population stratification followed by the covariates for farm f j , sire s k , birth year by l , birth season bs m , calving year cy n , calving season cs o , age at first calving in days ac p , the SNP genotype g q , and e ijklmnopq is the residual error. Covariates marked with an asterisk " * " were only included into the model when the difference in the Akaike information criterion ( AIC) was ≤ −10 between the null model (Y i = ps i ) and the null model extended with one of the covariates (Y ix = ps i + covariate x ) (Supplementary Table 2). All covariates were included as fixed effects as this resulted in the lowest inflation factor λ. Population stratification ps i among the 1,490 DSN cows was examined using pairwise population concordance tests on an identity-by-state matrix implemented in PLINK (version 1.90) with a p-value cut-off of 0.0001 (Purcell et al., 2007;Chang et al., 2015). The resulting 33 clusters of relatedness were included as a covariate in the GWAS model. Interactions between lactation stages and SNP genotypes were investigated for 100, 200, and 305 days performance data in LA1 and between lactation number and SNP genotypes for 305 days performance data in lactations 1 to 3. For testing the genetic interaction effects between milk, fat, and protein yield data and specific lactation periods, the difference between the difference between birth and 100 days, 100 and 200 days, and 200 and 305 days performance data was used as input data. A linear mixed-effects model as described by Lu and Bovenhuis (2019) was fitted using the R package lmerTest (version 3.1-3) (Kuznetsova et al., 2017): where the same covariates were used as in Eq. 1 but extended by the lactation stage or lactation number L r , the interaction term between lactation stage or lactation number and SNP genotype L r x g q , and a random intercept included for each individual animal (1| animal s ) to compensate for repeated measurements on the animals. Covariates marked with an asterisk " * " were only included into the model when the difference in the Akaike information criterion ( AIC) was ≤ = −10 between the null model [Y s = (1|animal s )] and the null model extended with one of the covariates [Y xs = covariate x + (1|animal s )] (Supplementary Table 3). Separate analyses were performed for the traits milk, fat and protein yield, and fat and protein content. The significance threshold was adjusted for multiple testing using Bonferroni (BF) correction. The number of independent tests (M eff ) was estimated by the simpleM method, to account for linkage between neighboring SNPs (Gao et al., 2008). Window sizes from 100 to 630 (630 corresponds to the minimum number of SNPs in one out of all chromosomes) in this study were tested. The lowest estimated M eff value was 18,278 at a window size of 630 which was then used for Bonferroni correction. After Bonferroni correction, SNPs were considered highly significant when P BF < 0.01, significant when P BF < 0.05, or suggestive when P BF < 0.1. In the case of interaction terms, M eff was multiplied by the number of lactation stages or lactation numbers (n = 18,278 * 3) and p-values were noted as P BI . Figures were produced using the R package ggplot2 (version 3.3.2) (Wickham et al., 2016) unless otherwise stated. For SNP effect plots, p-values between genotype groups were estimated using pairwise t-tests and displayed using R package ggpubr (version 0.4.0) (Kassambara, 2020).

Haplotypes and Gene Annotation
Haplotype blocks in which SNPs are located were computed using Haploview version 4.2 (Barrett et al., 2005). In order to define blocks in Haploview, the customized block definition was used with the option "solid LD spine" set to D > 0.6. Genes within each haplotype were annotated using R package biomaRt (version 2.48.0) (Durinck et al., 2009) using the Ensembl Bos taurus database based on the ARS-UCD1.2 assembly (Yates et al., 2020). If no haplotype block could be estimated for a SNP, genes within ± 70 kb up-and down-stream (corresponding to the median haplotype block length of 140 kb) of the respective SNP position were considered as positional candidate genes. Additionally, the 1 Mb region centered around the investigated SNP was inspected for candidate genes. If consecutive 1 Mb regions overlapped, they were merged by taking the start of the first and the end of the second region.

Overlap With Cattle QTLdb
Associated SNPs were compared to GWAS results and known QTLs from Cattle QTLdb Release 42 (Hu et al., 2019) by using their SNP-IDs (rs). Further, also the 1 Mb region centered around the associated SNP was used to find overlapping loci with other publications. The entries from Cattle QTLdb were categorized into the trait categories "exterior, " "health, " "meat and carcass, " "milk, " "production, " and "reproduction."

Associations for Almost All Traits
Genome-wide association analyses using 1,490 DSN cows and 36,929 high confident SNPs identified associations for all traits except for 100-days fat yield in lactation 1 (fat kg 100 days LA1) and the lactation mean of the 305-days fat yield (fat kg LAm). The inflation factor λ ranged from 1.27 for fat yield (LA2 305 days) to 2.19 for protein content (LA2 305 days) although extensive efforts were taken to select a GWAS model that showed the lowest overall inflation (Supplementary Table 1). Nonetheless, the inflation factor λ and the Q-Q-plots for all traits (Supplementary Figure 2) showed that the level of statistical significance was overestimated in general. We assume that this especially holds for protein content in all lactation periods and numbers and for fat content in LA1 (100 and 305 days). The number of false positives was increased as λ-values higher or equal to 1.5 were observed for these traits. Even if we reduce the genome-wide significance threshold to P BF < 0.0005, 16 SNPs remain associated. The results of those traits are listed in the Supplementary Table 4. Results of associations for 20 traits where λ was below 1.5 are presented in Table 1. For those traits, 14 highly significant, 27 significant, and 20 suggestive SNPs were found. Some SNPs were associated with multiple traits because of high correlation between those traits (Supplementary Tables 5,6). High correlation was found between performance data within the first lactation (100, 200, and 305 days of LA1) (r > 0.76), and between traits of the first three lactations (LA1, LA2, and LA3) and the lactation mean (LAm) (r > 0.77). Fat and protein yield within a lactation were also highly correlated (r > 0.75).

Effects on Milk Yield
Since milk yield is the economically most important production trait, there is major interest in identifying genetic loci contributing to its variance in DSN cows (Table 1 and Figure 1). Using the mean milk yield across the first three lactations (LAm), only two loci were found to be associated, one on chromosome 8 at 53.7 Mb (8:53,663,120, rs41793393, P BF = 0.03554) and another one on chromosome 9 at 10.6 Mb (9: 10,638,013, rs133869947, P BF = 0.04397) (Supplementary Figures 3A,B). These two loci were also associated with milk yield in lactation 3 (LA3). Examining different lactation periods and lactations separately, we found additional loci on chromosomes 1, 4, 5, 6, 20, 24, 25, and X. Since the significance varied largely between lactation periods and lactation numbers, the interaction between the SNPs and lactation stage or lactation number was also investigated (Supplementary Table 7).
In the first lactation (LA1), chromosome 6 was most significantly associated (Table 1 and Figure 1). In the region between 60.2 and 87.2 Mb, 9 SNPs were associated with milk yield in all lactation periods of LA1. The top marker rs109592101 (6:86,112,142, P BF = 0.00009, Supplementary Figure 3C) showed the highest association with the 100 days performance. The minor allele A with a frequency of 0.42 accounted for a decrease in milk yield of 78 kg, 119 kg and 193 kg after 100, 200, and 305 days in lactation, respectively. The same SNP also suggestively affected (P BF = 0.08431) protein yield with a minor allele effect of −1.8 kg after 100 days in lactation. In the haplotype block around the top SNP (6:85,633,295-87,011,619, Figure 2 and Supplementary Table 8) 16 genes are located, among them CSN3 (kappa casein) is known as main milk protein gene and as a gene affecting milk yield and composition in Holstein cattle (Mckenzie et al., 1984;Ng-Kwai-Hang et al., 1984). Another SNP in the same region on chromosome 6 (rs110291935, 6:80,530,130, P BF < 0.02, Supplementary Figure 3D) decreased milk yield until 100, 200, and 305 days in LA1 by 71, 132, and 209 kg, respectively. The corresponding haplotype block (6:80,530,130-80,626,467) did not contain any gene (Supplementary Table 8). However, the surrounding 1 Mb region harbors three genes ENSBTAG00000050977, EPHA5 (EPH receptor A5), and ENSBTAG00000053125. Additional SNPs for milk yield in LA1 were found on chromosome 1 (rs42347234, 1:115,539,332, P BF = 0.03846; rs41255272, 1:143,812,919, P BF = 0.01478), chromosome 4 (rs42753220, 4:91,071,208, P BF = 0.01390), chromosome 5 (rs41652414, 5:114,935,739, P BF = 0.02814), and chromosome 25 (rs109027867, 25:11,019,450, P BF = 0.03396) (Supplementary Figures 3E-I). The minor allele of these SNPs showed an increase in milk yield by at least 64 kg, 141, and 204 kg for 100, 200, and 305 days performance data of LA1, respectively, expect for the SNP rs109027867 on chromosome 25 which showed a decrease of the minor allele by 128 kg for 200 days data.
In lactation 2 (LA2), the only association with milk yield was found on chromosome 20 (Table 1 and Figure 1) with the SNP rs110353352 (20:71,448,297, P BF = 0.00040, Supplementary Figure 3J). The minor allele C, which segregated at a frequency of 0.13, was advantageous for milk production increasing milk yield by 484 kg. This same SNP allele had also positive effects of 17.3 kg and 15.0 kg on fat and protein yield in LA2, respectively. The haplotype block of this SNP (20:71,378,518,297) contains five genes TRIP13 (thyroid hormone receptor interactor 13), BRD9 (bromodomain containing 9), ENSBTAG00000054687, TPPP (tubulin polymerization promoting protein), and CEP72 (centrosomal protein 72) (Supplementary Table 8).
In lactation 3 (LA3), the most significant association for milk yield was found on chromosome 1 ( Table 1 and Figure 1). The top SNP rs43246393 (1:79,757,250, P BF = 0.00020, Supplementary Figure 3K) was also associated with fat (P BF = 0.01868) and protein yield (P BF = 0.00007) in LA3 as well as with protein yield of the average across all three lactations (P BF = 0.02641). The minor allele T, segregating at a frequency of 0.34, accounted for 390 kg, 13.4 kg and 13.2 kg less milk, fat, and protein in LA3, respectively, and for 10.3 kg less protein in the lactation mean (LAm). The test for interaction between genotypes at this top SNP and lactation stage or lactation number showed significant effects on milk, fat, and protein yields (P BI < 0.03, Supplementary Table 7). The corresponding haplotype block around this SNP (1:79,606,618-79,757,250) contains three genes BCL6 (BCL6 transcription repressor), RTP2 (receptor transporter protein 2), and SST (somatostatin). In addition to the locus on chromosome 1, two SNPs on chromosome 8 were found to be associated (rs41793393, 8:53,663,120, P BF = 0.00508, Supplementary  Figure 3l; rs109542652, 8:53,867,972, P BF = 0.02538). As mentioned above, this region was also identified for the average milk yield over lactations 1-3 (LAm). The minor allele T of the lead SNP rs41793393 (frequency 0.19) is decreasing milk yield in LA3 by 356 kg and in LAm by 316 kg. The corresponding haplotype block (8:53,467,317-54,449,954) contains the genes GNA14 (G protein subunit alpha 14), GNAQ (G protein subunit alpha q), CEP78 (centrosomal protein 78), and PSAT1 (phosphoserine aminotransferase 1). Additional  For each trait, significantly associated SNPs are listed with chromosome number (Chr), chromosomal position in base pairs (bp), the SNP reference ID (SNP-ID), the given allele on the forward strand in the reference genome ARS-UCD1.2 (Ref), the alternative allele (Alt), the minor allele in the examined DSN population (MA), and the minor allele frequency (MAF). Furthermore, the associated trait, the number of cows in the specific analysis of this trait (n), the allele substitution effect of the minor allele (β), the standard error [SE(β)], and the p-value after Bonferroni-correction (P BF ) are given. P-values of P BF < 0.1, P BF < 0.05, and P BF < 0.01 were considered as suggestive (highlighted in gray), significant, or highly significant (highlighted in bold), respectively.

Effects on Milk Fat Yield and Content
Regions associated with milk fat yield were identified on chromosomes 1, 20, 24, and 25 (Table 1 Table 1).

Effects on Milk Protein Yield and Content
Highly significant effects (P BF < 0.01) on milk protein yield were identified on chromosomes 1 and 20 with data from LA3 and LA2, respectively (Table 1 and Supplementary Figure 6). The SNPs rs43246393 (1:79,757,250, P BF = 0.00007, Supplementary Figure 3X) and rs110353352 (20:71,448,297, P BF = 0.00503, Supplementary Figure 3Y) on these two chromosomes showed besides their effects on milk protein yield also an effect on milk and fat yield in the same lactations. While the minor allele of the chromosome 1 SNP was disadvantageous for all yield traits, the minor allele of the chromosome 20 SNP was favorable.
All association tests with milk protein content resulted in highly inflated p-values with λ ≥ 1.5 (Supplementary Figure 2). Therefore, we decided to focus on most significant results by lowering the genome-wide significance threshold 100-fold from P BF < 0.05 to P BF < 0.0005. This resulted in 16 associated SNPs for protein content in different lactation periods and numbers (Supplementary Table 4 and Supplementary Figure 7). These associations were found on chromosomes 2, 3, 5, 6, 10, 18, 20, and 28. The most significant association was found on chromosome 3 (rs110474631, 3:21,764,453, P BF = 4.5E-06, Supplementary Figure 3Z). This SNP was found for protein content in LA1 (200, 305 days), LA2, and LAm. The minor allele G of this SNP had a frequency of 0.20 and decreased protein content by on average 0.05 % points in the before mentioned lactation periods and numbers. The corresponding haplotype block (3:21,764,453-21,819,709) does not contain any gene, but the 1 Mb region harbors 23 different genes (Supplementary Table 9). The SNP rs110291935 located on chromosome 6 was the second highest association found in all periods of LA1 (6:80,530,130, P BF < 0.0003, Supplementary Figure 3A2). The minor allele T of this SNP increased protein content by 0.045-0.047 % points. The same SNP was already found to be associated with milk yield in all periods of LA1 (Table 1), where the minor allele decreased milk yield by 71, 132, and 209 kg in the 100, 200, and 305 days performance data, respectively. The corresponding haplotype and candidate genes were described in detail in the section of associations with milk yield. The third highest association was found on chromosome 10 at 46.5 Mb with the top SNP rs109605174 (10:46,450,562, P BF = 6.5E-06, Supplementary Figure 3B2) in LA2. The minor allele of this SNP increased protein content by 0.055 % points. The neighboring SNPs rs109277788 (10:44,773,979, P BF = 0.00002) and rs43625129 (10:47,670,717, P BF = 9.2E-06) increased also protein content in LA2 and in the first 100 days in LA1. The corresponding haplotype block (10:46,330,098-46,672,954) comprises seven genes ENSBTAG00000054388, ENSBTAG00000050908, ENSBTAG00000019474, FBXL22 (Fbox and leucine rich repeat protein 22), USP3 (ubiquitin specific peptidase 3), CA12 (carbonic anhydrase 12), and APH1B (aph-1 homolog B, gamma-secretase subunit).

Missing Associations to Regions With Known Effects on Milk Production
Since some genes are well known for their effects on milk yield and composition in Holstein cattle, we separately tested the association of SNPs in close proximity (<500 kb) to such candidate genes (Ogorevc et al., 2009). We found 19 SNPs on the SNP chip that were in close proximity to 10 candidate genes for milk production (ABCG2, CSN1S1, CSN1S2, CSN2, DGAT1, GHR, GPAT4, PAEP, PRLR, and SPP1). None of those 19 SNPs were significantly or suggestively associated with any of the investigated traits in DSN (Supplementary Table 10 and Supplementary Figure 8).
Especially, for the region on chromosome 6 between 80.5-87.2 Mb (flanking SNPs: rs110291935 and rs41591365), which was significantly associated with milk and protein yield in all investigated periods of LA1 in our study, many different associations could be found in Cattle QTLdb. These associations included milk and protein yield as well as milk kappa-casein content in Holstein cattle (Meredith et al., 2012;Buitenhuis et al., 2016;Jiang et al., 2019), curd firmness and cheese fat recovery in Brown Swiss cattle (Dadousis et al., 2017), health traits such as somatic cell score in Holstein cattle (Jiang et al., 2019), and exterior traits such as facial pigmentation in Fleckvieh cattle (Mészáros et al., 2015).

DISCUSSION
In this study, we investigated genetic factors underlying milk production traits in DSN cattle. We detected 41 significant and 20 suggestive SNPs mostly found in regions previously associated to milk production traits or other trait categories. The high overlap of associated SNPs in this study with other studies shows that the regions that influence milk traits in DSN may have similar or even pleiotropic functions in other breeds.
We observed a high overlap of identified genomic regions in DSN and Holstein, even if we did not find significant effects of major candidate genes that affect milk production in Holstein cattle. While the key genes driving e.g., milk yield or fat and protein content might be different between DSN and Holstein, it is surprising that especially the well described association in the region of and around DGAT1 was not detected in DSN although DSN and Holstein cattle are closely related. Although the MAFs of the three closest SNPs to DGAT1 gene are high (MAF = 0.44-0.47) in DSN (Supplementary Table 10), we observed that the actual variant causing the A232K substitution is very rare having a MAF of 0.02 (preliminary results of sequencing data from 57 DSN cattle). It is possible that not only the causal variant of DGAT1, but also of other known milk genes are rare in DSN and thus do not have much effect in the DSN population.
Several interesting candidate genes were found including CSN3 gene (kappa casein) that belongs to the casein cluster on chromosome 6 at 86 Mb, which are known to influence milk fat and protein content as well as milk properties (Caroli et al., 2009), also plays an important role in DSN for milk and protein yield. The genetic architecture of the casein gene cluster of DSN in comparison to other breeds was recently investigated in detail (Meier et al., 2019). Among 14 investigated breeds in that study, the casein gene cluster of DSN was most similar to Danish Red. One of the most significant associations on chromosome 1 at 79 Mb was close to SST (somatostatin) and BCL6 (BCL6 transcription repressor). Since SST regulates the secretion of pituitary hormones including prolactin (Eigler and Ben-Shlomo, 2014), it may contribute to the regulation of milk production (Dybus, 2002;Alipanah et al., 2007). And BCL6 was found to be expressed in mammary epithelium and also in breast cancer (Logarajah et al., 2003). Also interesting is SLC6A3 gene (solute carrier family 6 member 3) on chromosome 20 at 71 Mb to which the Gene Ontology term "lactation" (GO:0007595) was assigned. This assignment was inferred from electronic annotation by Gene Ontology Consortium (Ashburner et al., 2000) and is possibly based on the homology of this gene in mice where it was shown that mutations in this gene cause lactation failure (Bossé et al., 1997). Beside SLC6A3, other solute carrier family genes were found in the same region: SLC6A18 (family 6 member 18), SLC6A19 (family 6 member 19), SLC9A3 (family 9 member 3), and SLC12A7 (family 12 member 7). Also, genes coding for enzymes or receptors targeting milk ingredients such as prostaglandin and riboflavin were found, e.g., PTGR1 (prostaglandin reductase 1) on chromosome 8 at 101 Mb, PTGIR (prostaglandin I2 receptor) on chromosome 18 at 54 Mb, and FLAD1 (flavin adenine dinucleotide synthetase 1) on chromosome 3 at 15 Mb.
Since population stratification has a significant effect on the number of false positive QTLs detected during association analysis, we tried multiple approaches to reduce the inflation factor λ for different traits of interest. The higher the inflation factor λ, the higher the number of false-positively reported significant associations in GWAS analyses. Although we tried to compensate for population stratification by using pairwise population concordance test and paternal pedigree information, we were unable to capture the population structure entirely. That is most likely due to the fact that the population size of DSN is small, and the number of breeding bulls used each generation is limited. As such, DSN animals are not unrelated and complex family relationships exist between animals, which is a common issue in small populations. Thus, we assume that the residual inflation is genetically caused due to high linkage disequilibrium between SNPs. Moreover, using both population stratification and pedigree information led to an overfitting of the model that took away genetic variance. Further genetic variance was lost as we required that each sire was represented by at least 20 offspring in the phenotype data and thus the number of sires contributing to the analyzed DSN population was reduced by two-thirds. By doing so, also the number of animals available for GWAS was reduced and as a result the statistical power to detect significant associations. Nonetheless, we believe that lowering the genetic variance and statistical power in favor of lower inflation of p-values is the proper way to prevent false positive associations.
The results of this study could be used as prior information to detect SNPs associated with milk production traits in other (related) breeds. For that, cattle from other breeds could be genotyped for single SNPs (e.g., top SNPs) instead of being genotyped using SNP chips or even being whole genome sequenced which would reduce costs. Furthermore, since genotypic data of a sufficient amount of DSN cattle was not available yet, genomic breeding is not yet established. The identified SNPs and their associations could be directly used to make breeding decisions by DSN breeders. Even predictions about performance of young DSN cows that do not have any phenotypic data yet could be made if genotypic data would be available. Thus, a first step toward genomic breeding for milk traits in DSN cattle was made. Further analyses are needed to evaluate whether these associated SNPs do not show any negative effect on other trait categories such as fertility.

CONCLUSION
The genome-wide association analysis identified several significant regions associated with milk traits in diverse lactation periods and numbers in dual-purpose DSN cattle. In the view of the fact that the biggest possible sample size of the investigated DSN population was still relatively small, the most promising SNPs for improving milk production in DSN are those that were also significantly associated with milk traits in other studies. These SNPs were located on chromosome 3, 5, 6, 8, 10, 20, and X. In contrast, no association to the well-known DGAT1 region could be detected in DSN. Further analysis is needed to make sure that these SNPs do not negatively affect other important production traits or DSN characteristic traits such as carcass and meat, conformation, fertility, and health before using them as breeding markers. Nevertheless, the results of this study are a basis for further genetic analysis to identify genes and causal variants that affect milk traits in DSN cattle as well as in other related breeds.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repositories and accession numbers can be found below: The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) European Nucleotide Archive (ENA) and European Variation Archive (EVA), https://www.ebi.ac.uk/ena/browser/ home and https://www.ebi.ac.uk/eva/, PRJEB42513 (project) and ERZ1701738 (analyses).

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because samples were collected based on routine procedures on these farm animals. Ear tags were taken as part of the required registration procedure, blood samples were taken by a trained veterinarian to perform standard health recording. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
GB, PK, and DA designed the study. PK performed all computational and statistical analysis, interpreted the data, and drafted the manuscript. DA helped with the statistical analysis. SK and KM provided genotypes of 61 cows. DA and GB helped draft the manuscript. All authors read and approved the final manuscript.