A Review of Genomic Models for the Analysis of Livestock Crossbred Data

Livestock breeding has shifted during the past decade toward genomic selection. For the estimation of breeding values in purebred breeding schemes, genomic best linear unbiased prediction has become the method of choice. Systematic crossbreeding with the aim to utilize heterosis and breed complementary effects is widely used in livestock breeding, especially in pig and poultry breeding. The goal is to improve the performance of the crossbred animals. Due to genotype-by-environment interactions, imperfect linkage disequilibrium, and the existence of dominance and imprinting, purebred and crossbred performances are not perfectly correlated. Hence, more complex genomic models are required for crossbred populations. This study reviews and compares such models. Compared to purebred genomic models, the reviewed models were of much higher complexity due to the inclusion of dominance effects, breed-specific effects, imprinting effects, and the joint evaluation of purebred and crossbred performance data. With the model assessment work conducted until now, it is not possible to come to a clear recommendation as to which existing method is most suitable for a specific breeding program and a specific genetic trait architecture. Since it is expected that a superior method includes all the different genetic effects in a single model, a dominance model with imprinting and breed-specific SNP effects is proposed. Further progress could be made by assuming realistic covariance structures between the genetic effects of the different breeding lines, and by using larger marker panels and mixture distributions for the SNP effects.


INTRODUCTION
The crossing of different lines or breeds is widely used in animal breeding with the main aim to produce superior offspring. This superiority results from heterosis and from breed complementary effects. Continuous and discontinuous crossbreeding schemes have been designed and are implemented in various livestock species (Lopez-Villalobos et al., 2000;Samorè and Fontanesi, 2016). In discontinuous schemes, crossbred animals are used solely for production and are not selected as parents of the next generation. Breeding takes place in the parental breeds and the breeding goal is usually to improve crossbred performance. The level of organization in such a system is high and it is sometimes difficult to utilize by-products, such as male offspring of mother lines. These schemes can be predominantly found in livestock species with a high female reproduction rate such as pigs and poultry. In continuous breeding schemes, the female crossbreds are used as parents to breed the next generation. These systems are sometimes implemented in livestock species with a low female reproduction rate such as cattle. Since there are substantial non-additive effects for reproduction traits in dairy cattle (Jiang et al., 2017), the aims of crossbreeding in dairy cattle are to improve reproduction traits and other functional traits by exploiting heterosis and imprinting and by removing inbreeding depression (Sørensen et al., 2008;Buckley et al., 2014).
A further form of crossbreeding is the upgrading of low-performance breeds with high-yielding breeds. This introgression of genes from high-yielding breeds increases the production level in subsequent generations and reduces inbreeding depression by increasing the genetic diversity of the low-performance breed. This breeding system has frequently been applied to local breeds, such as the German Vorderwald cattle (Hartwig et al., 2014(Hartwig et al., , 2015. However, if upgrading is repeated over several generations, then the breed eventually goes extinct because the native alleles are removed from its gene pool. The formation of a synthetic breed can also be seen as a special form of crossbreeding. A well-known example is the establishment of the so-called Schwarzbuntes Milchrind in the former East Germany (Freyer et al., 2008).
Livestock breeding has shifted toward genomic selection, which is now frequently implemented in large pure breeds. The core of the system that has been implemented in pure breeds is a reference population that consists of genotyped and phenotyped animals. The phenotypes are either the animal's own performance records, or deregressed conventional breeding values. The reference population is needed for the prediction of marker effects. The marker effects are then used for predicting genomic breeding values of the genotyped selection candidates. The reliability of genomic breeding values depends on the size of the reference population, on the effective number of chromosome segments, and on the method used for the prediction of marker effects (Goddard, 2009).
Extensive research has been dedicated to develop statistical models for the prediction of marker effects. These statistical models include the SNP-BLUP model that assumes normally distributed SNP effects, various Bayesian models that assume more heavy-tailed distributions, as well as non-parametric and semi-parametric models (Meuwissen et al., 2001;Gianola, 2013). More complex models assume different SNP variances, depending on the type of control region the SNP belongs to MacLeod et al. (2016). Some models avoid the prediction of marker effects by building a genomic relationship matrix based on SNP genotypes. The most prominent method based on genomic relationships is GBLUP, which is an equivalent model to SNP-BLUP (VanRaden, 2008;Goddard, 2009). The genotyped selection candidates are included in the model, and their genomic breeding values are calculated by utilizing their genomic relationships with the reference population. GBLUP assumes that all animals are genotyped, which is in general not the case. Therefore, the genomic breeding values are blended in a second step with pedigree-based breeding values to obtain genomically enhanced breeding values on which selection decisions are based. This two-step procedure can be avoided with so-called single-step GBLUP models (ssGBLUP). They were developed as extensions of GBLUP. Single-step models include genotyped and non-genotyped animals simultaneously (Legarra et al., , 2014Aguilar et al., 2010;Christensen and Lund, 2010) and assume a particular covariance structure for the breeding values that is computed from genomic and pedigree-based relationships. Fernando et al. (2014) extended the single step model toward non-normally distributed marker effects. In purebred routine application mostly additive effects are considered, with dominance being an integral part of the estimated breeding values. Some genomic models were extended toward accounting for dominance explicitly, but this increased the realibilities of the breeding values only slightly (Su et al., 2012;Wellmann and Bennewitz, 2012;Azevedo et al., 2015).
To summarize, it seems that in practical purebred genomic evaluations, GBLUP and ssGBLUP have and will become the models of choice, and non-additive gene effects are usually not an issue. The picture is however somewhat different if data from crossbred animals in combination with the parental purebred data is analyzed. The potential applications of genomic models with non-additive genetic effects have been reviewed by Varona et al. (2018). The main breeding goal is in this case to improve the performance of the crossbred animals. Due to genotype-by-environment interaction, imperfect LD, and the existence of dominance, epistasis and imprinting, purebred and crossbred performances (PP and CP, respectively) are not perfectly correlated (e.g., Wei and van der Werf, 1995;Dekkers, 2007;Zumbach et al., 2007;Duenk et al., 2019). Wientjes and Calus (2017) reviewed existing literature about purebredcrossbred correlations in pigs. The average from 201 reported correlation coefficients was 0.63 with 50% of the reported coefficients being between 0.45 and 0.87. The purebred-crossbred correlation affects the optimal design of the reference population (van Grevenhof and van der Werf, 2015) and the choice of an appropriate genomic model.
While genomic models are well-established for pure breeds, much research has been conducted in the recent years to develop genomic models for the analysis of crossbred data. The aim of this study is to review genomic models for the prediction of crossbred performance that were recently developed and were evaluated either using simulated or real crossbred data.

GENOMIC MODELS
Genomic models for crossbred data are extensions of purebred models. The extensions were made in several directions. Most genomic models for the analysis of crossbred data are developed for two-way crosses. A two-way cross X is created from a sire line A and a dam line B, which are usually not inbred. The pure lines have breeding values a A and a B for PP, and breeding values c A and c B for CP. Typically, some animals are genotyped, whereas others are not. The goal is to obtain accurate predictions of the breeding values for CP by utilizing phenotypic information from genotyped and ungenotyped purebred and crossbred animals. An overview over the considered models is given in Table 1.
The SNP alleles are usually assumed to be biallelic, so they may be coded as alleles 1 and 2. Most authors use centered allele content matrices as proposed by VanRaden (2008). The centering does not affect the predictions, but affects the model-based reliabilities (Strandén and Christensen, 2011). We denote with the centered allele content matrix for the genotyped animals from line A, whereby the allele content G A im ∈ {0, 1, 2} of animal i from line A is the number of copies of allele 2, animal i has at SNP m, and P A im is the frequency of allele 2 of SNP m in line A. Moreover, we denote with the centered allele origin matrix for alleles from cross X that originate from line A. That is, G A X im ∈ {0, 1} is the number of copies of allele 2, crossbred animal i has obtained from sire line A at SNP m. These matrices are needed to define genetic values of purebred and crossbred animals. The vector with breeding values for CP for animals from line A has the representation where α A is the vector with allele substitution effects for CP. The vector with breeding values for PP has the representation whereα A is the vector with allele substitution effects for PP. The equations for a B and c B are similarly. Most genomic models for two-way crosses utilize, that the vector a X with additive genetic values of the crossbred animals can be decomposed into a contribution c A X that comes from sire line A, and a contribution c B X that comes from dam line B. That is, where The contribution c A X from line A can be further decomposed into a contribution that comes from the breeding values c A for CP, and into a vector m A X that contains the Mendelian sampling terms of the transmitted gametes (Wei and van der Werf, 1994). That is, where matrix Z X A assigns animals from line A to their crossbred offspring. Different models have been developed for predicting CP, which can broadly be classified into additive models and dominance models. While some models predict the breeding values for CP directly with Equation (5), others predict the vector α A with allele substitution effects for CP. In the latter case, the estimated breeding valuesĉ A for CP in line A are obtained by substituting α A with the predictionα A in Equation (1).

Additive Models
Different additive models have been proposed in the literature. Some models assume that the crossbred animals are genotyped, whereas others do not. The general additive model for a twoway cross is a trivariate model that has two equations for the parental lines, and one equation for the cross. It has the general representation where y A , y B , y X are vectors with phenotypic records of the respective subpopulation, b A , b B , b X are vectors of fixed effects with design matrices X A , X B , X X , and u A , u B , u X are vectors of non-genetic random effects with design matrices Z A , Z B , Z X . Finally, a A , a B are the breeding values for PP, and E A , E B , E X are the residual terms. The term ". . . " in the third equation depends on the respective model. The first two model equations are needed because PP and CP are genetically correlated (Wientjes and Calus, 2017), so phenotypic records of purebred animals increase the reliabilities of the breeding values for CP.

The Parental Additive Model
The parental additive model is based on Equations (2), (3), and (5), and is suitable when the crossbred animals are not genotyped. The model assumes that the Mendelian sampling terms are part of the residuals, so the model equations become

The BSAM and ASGM Model
The model with breed-specific allele effects (BSAM) and the model with breed-independent allele effects, which is also called the across-breed SNP genotype model (ASGM) are based on Equations (2-4), and require that the crossbred animals are genotyped. While the ASGM model predicts one effect per SNP, the BSAM model predicts one effect for the paternal allele, and one for the maternal allele of the crossbred animals. Originspecific allele effects may occur e.g., due to a different LD pattern between the marker and the QTL, different gene frequencies at the QTL, imprinting effects, or the epistatic effects may be different in the pure breeds. This results in different effects of the marker alleles and thus affects the estimated breeding values. The first two equations of the BSAM and ASGM model are as above, whereas the third model equation becomes for the BSAM An equivalent representation for the ASGM model is Ibánez-Escriche et al. (2009) predicted CP of the parental lines from genotyped crossbred animals with BSAM and ASGM, whereby the breed-specific allele substitution effects of the BSAM model were a priori independent. The allele substitution effects were estimated with BayesB, which is a method that assumes that many of them are actually zero. An oligogene trait was simulated with breed-independent QTL effects. Although the SNP effects are expected to be breed-specific due to differences in LD between markers and QTL, the authors found that the BSAM model outperformed ASGM only if the number of markers was low, the number of records for training was high, and if the parental breeds were distantly related. Lopes et al. (2017) used the BSAM model with normally distributed SNP effects to predict breeding values for CP from crossbred data, and compared the results with conventional GBLUP. The model provided similar prediction accuracies as conventional GBLUP for the traits litter size and gestation length in pigs. It may be not superior to GBLUP because the allele substitution effects of the different breeds were implicitly assumed to be uncorrelated, which is an assumption that is not likely to be fulfilled. Sevillano et al. (2019) extended the BSAM and ASGM model toward a three-way cross and distinguished SNP that showed a strong trait association from all remaining SNP. For the trait associated SNP breed-specific effects were estimated, whereas for the remaining SNP one effect was estimated, regardless of the allele origin. This model was compared with the BSAM model and with the ASGM model for the trait daily gain by assuming normally distributed SNP effects. Purebred as well as crossbred data was used for training. The results showed a superiority of their method only if the estimated genetic correlations between PP and CP for the trait associated SNPs and the remaining SNPs were unequal. Vandenplas et al. (2017) derived equations for predicting the reliability of genomic breeding values for CP for BSAM and ASGM models and assumed normally distributed SNP effects.
The authors found that BSAM outperformed ASGM for a specific parental line, if the effective number of chromosome segments in the crossbred reference animals that originate from the parental line is less than half the effective number of all chromosome segments that are independently segregating.

Additive Single Step Model
While BSAM has the disadvantage that all crossbred animals have to be genotyped, the parental additive model has the disadvantage, that the information provided by the Mendelian sampling terms cannot be utilized for prediction. These problems could be resolved by using a trivariate model of the form that includes both, genotyped and phenotyped animals. Christensen et al. (2014) derived the joint covariance matrix A A of c A X , c A , and a A by using the pedigree-based model of Wei and van der Werf (1994) as a starting point. The authors derived the covariance matrix A A from pedigree relationships, and replaced it in a subsequent step by a covariance matrix H A that combines pedigree and genotype information.
The authors evaluated six growth and meat traits and found that the genetic correlations between purebred and CP were larger than 0.69 for all traits. The accuracies of the genomic breeding values were higher than those obtained from univariate singlestep models that took either purebred or CP into account, and also higher than those obtained with pedigree-based models.

Dominance Models
Crossbreeding utilizes heterosis and breed complementarity. A widely accepted hypothesis is that heterosis arises predominantly from dominance effects. An animal carries a dominance effect only if it is heterozygous at a particular QTL. We denote with is the heterozygosity of SNP m in line X . The dominance model assumes that the vector g X with genotypic values of the crossbred animals has the representation where µ X is the population mean, a X is the vector with population-dependent additive effects, and d X is vector with population-dependent dominance effects.
The genotypic values of purebred animals are defined accordingly. The trivariate dominance model for a two-way cross and the parental lines has therefore the representation which we call the dominance model with line-dependent effects.
The vector c A with breeding values for CP from breed A has the representation given in Equation (1), but the vector with allele substitution effects for CP is where p B is vector with allele frequencies in the opposite line, and the Hadamard product "• ′′ is the component-wise product.
The breeding values and allele substitution effects for line B are defined accordingly. Predictionsâ X andd X of a X and d X are needed to get predictions of the allele substitution effects for CP in line A with equation Some solvers are unable to account for the fact that E d X = µ X d 1 = 0 for most traits. As shown by Xiang et al. (2016b), one may write d X = d X * + µ X d 1 such that E d X * = 0. Then, the term Z X H d X in Equation (7) from which the SNP effects can be backsolved. Thereby, the animal effects satisfyã X = Z X G a X , andd * X = Z X H d X * , and so on. The joint covariance matrices of the animal effects are given in Christensen et al. (2019).
The SNP effects in Equation (7) were assumed to be linedependent, which may be the case because the LD between SNP and QTL differs between lines. This may be neglected if the marker panel is sufficiently large. In this case, the SNP effects can assumed to be line-independent, and we obtain the simplified model which we call the dominance model with lineindependent effects. Vitezica et al. (2013) emphasized that two different parameterizations of the dominance model exist. The first parameterization, which is given by Equation (6), is suitable for two-way crosses, and includes the additive and dominant SNP effects. In contrast, the second parameterization includes the allele substitution effects and the dominance deviations of the SNP. Both parameterizations are equivalent, but their interpretation is different.

Model Evaluation
Zeng et al. (2013) compared a Bayesian dominance model with the corresponding BSAM model and the corresponding ASGM model. A BayesCπ type method was used to estimate the marker effects, so the prior assumption was that the SNP effects are either zero, or come from a normal distribution. The comparison was done for a simulated two-way crossbreeding program. A number of 20 generations of selection was simulated with the aim to improve CP in both parental lines. The marker effects were estimated only once in generation one from crossbred animals and used in all subsequent generations. The simulated traits showed a different degree of dominance variance, ranging from "large" to "realistic, " or null. The dominance model was superior to the BSAM model and to the ASGM model. This superiority depended on the fraction of dominance and thus heterosis in the data, but even for situations where no dominance was simulated, the accuracy of the dominance model was similar to the additive model, indicating the robustness of the model. It can tentatively be concluded, that the use of a dominance model is in general advisable, even if dominance is not an important source of trait variability. Xiang et al. (2016b) used a dominance model with linedependent effects for a two-way cross and the parental breeds.
The SNP effects were normally distributed, and the additive and dominance effects of the three different populations were correlated. The authors found that the increased predictive ability of the dominance model arose solely from capturing inbreeding depression. This suggests that dominance effects of individual QTL have not been captured. The reason may be that a 60K SNP panel is not sufficient for achieving high LD between markers and QTL, and that the normality assumption is unlikely to be fulfilled. Esfandyari et al. (2016) compared a Bayesian dominance model with the corresponding Bayesian ASGM model at the example of litter size in a two-way pig cross, whereby BayesC of Habier et al. (2011) was used for prediction. Training was on the parental lines. The prediction accuracies for PP and CP obtained with the dominance model were both higher than those for PP obtained with the ASGM model.

Implications for Breeding Programs
All additive models for predicting CP rely on phenotypic data collected from crossbred animals. This can be problematic in situations where the crossbred animals are not individually identified and thus such data collection pipeline is not implemented. This is likely the case on many farms housing crossbred animals. While additive models require phenotypes from crossbred animals, this is not the case for dominance models because the breeding values for CP can be derived from additive and dominance effects that are predicted in the pure breed, and from the allele frequencies in the opposite breed. Esfandyari et al. (2015a) proposed therefore to use dominance models for selecting purebred animals for CP based on purebred phenotypic and genotypic information only. They did a simulation study and estimated the marker effects with Bayesian LASSO (Park and Casella, 2008;los Campos et al., 2009). The results showed that the gain in CP was higher when the purebreds were selected for CPs, which demonstrated the feasibility of the method even when no crossbred data is available. Moreover, combining several related lines into a single reference population increased the prediction accuracy. However, as shown by Esfandyari et al. (2015b), training on crossbred animals leads to a higher selection response than training on purebred animals. A likely explanation is, that the level of heterozygosity was higher than in the purebred data.
Although genomic selection for CP is a promising strategy to increase selection response for CP in the short and medium term, Esfandyari et al. (2018) found that genomic selection for CP leads eventually to lower CP in the long term than genomic selection on PP. This hold regardless of whether training was on purebred or crossbred animals.

Dominance Model With Imprinting
Dominance effects, as well as additive effects may depend on the breed of origin, which may be due to imprinting or breed complementarity. It could therefore be advantageous to account for imprinting explicitly. A dominance model with imprinting needs to distinguish between the paternal and the maternal allele. If an animal has received allele A 1 from line A and allele A 2 from line B, then we denote its genotype as A 1 A 2 . The centered indicator matrix for genotype A 1 A 2 is given by , 1} equals one, if animal i from cross X has genotype A 1 A 2 at SNP m, and H A 1 A 2 X im is the proportion of animals from cross X that have this genotype at SNP m.
The dominance model with imprinting assumes that the vector g X with genotypic values of the crossbred animals has the representation where µ X is the population mean, vectors a X A and a X B contain breed-of-origin dependent additive effects, and vectors d X A and d X B contain breed-of-origin dependent dominance effects. The model equation for the crossbred animals becomes If imprinting in the parental lines is neglected, then the model equations for the parental lines remain as in Equation (7). The vector with allele substitution effects for CP of line A is in this case where p B is the vector with allele frequencies in the opposite line. The proof is given in the Supplementary Material. When the SNP effects in the cross do not depend on the breed of origin, then the model simplifies, and becomes identical to the dominance model with line-dependent effects. Nishio and Satoh (2015) proposed two alternative parameterizations for models with dominance and imprinting and fitted them by assuming normally distributed SNP effects. Their first model includes an additive effect, a dominance effect, and an imprinting effect for the heterozygous genotype, while their second model includes a paternal and a maternal gametic effect, and a dominance effect. The models provided in a simulation study more accurate estimates of genotypic values than GBLUP. While the models of Nishio and Satoh (2015) have the advantage that only 3 effects are needed in the equivalent SNP model for modeling the contribution of each SNP to the genotypic value of an animal, the model in Equation (9) has the advantage that more rigorous prior assumptions can be made for the joint distribution of the effects. That is, if the paternal lines are closely related, then the additive effects a X A and a X B could assumed to be a priori highly correlated, as well as the dominance effects d X A and d X B . However, the parameterization does not allow to predict the vectors a X A , a X B , d X A and d X B individually. Esfandyari et al. (2015b) compared in a simulation study a Bayesian dominance model with imprinting with the corresponding dominance model with line-independent effects, but used a different parameterization. The model considered imprinting because it included a separate effect for each phased genotype. Compared to the model proposed above, it has the disadvantage that the effects have no direct interpretation as additive and dominance effects. The genetic effects of the parental breeds were a priori independent. Even though the authors did not simulate imprinting, they found that the dominance model with imprinting was superior, if the reference population was sufficiently large, and if both lines were not closely related. The reason may be that the LD between markers and QTL was different in the cross and in the parental lines, so the additive effects and dominance effects were population-dependent.

DISCUSSION
In this paper, genomic models for the analysis of discontinuous crossbred data were reviewed. Compared to purebred genomic models, the reviewed models were of much higher complexity due to the inclusion of dominance effects, breed-specific effects, imprinting effects, and the use of PP and/or CP data. In the following some additional aspects regarding the distribution of the SNP effects and the model choice are considered.

Distribution of SNP Effects
The normal distribution is the most common assumption about the distribution of SNP effects. Such models have the advantage, that they have an equivalent representation as animal models with genomic covariance matrices for which fast solvers exist, such as DMU (Madsen et al., 2010), WOMBAT (Meyer, 2007), ASReml (Gilmour et al., 2009), blupf90 (Misztal, 1999), or MiX99 (Vuori et al., 2006). Although the assumption of a normal distribution is not likely to be fulfilled when large marker panels are used, the experience with purebred data suggest that the reliabilities of the breeding values are only slightly worse than those obtained with non-normally distributed marker effects. However, the situation in crossbreeding is different because the parental lines are commonly distantly related, and it may be envisaged to evaluate all lines simultaneously in order to increase the reliabilities of the breeding values. This requires that all QTL are in high LD with at least one marker, which implies the necessity to use a large marker panel. However, if the marker panel is large, then only few markers are needed to capture the effect of any QTL. Consequently, the true effects of most markers are actually zero. The model for genomic selection should account for this and assume as a prior distribution for the SNP effects a mixture of two distributions. One component provides the distribution for markers that are in strong LD with a QTL, and the other one is actually zero. In this case, a randomvariable γ m is commonly introduced, which indicates whether the effects of an SNP m are different from zero. Well-known examples are BayesB (Meuwissen et al., 2001), BayesC (Habier et al., 2011), and BayesR (Erbe et al., 2012). Such algorithms are usually implemented with MCMC algorithms, which results in long computation times. However, alternative and faster implementations are available for some models (e.g., Meuwissen, 2009;Shepherd et al., 2010).
For models with additive and dominance effects, an important aspect is, whether these effects are a priori independent or not. It may be advantageous to assume that all effects of a particular SNP m are of the same order of magnitude. This is possible if all effects of a particular SNP m have conditionally on the common covariance matrix γ m σ 2 m a normal distribution, where σ 2 m ∼ Inv-χ 2 (v, s) and is an appropriately chosen covariance matrix. For the dominance model with line-dependent effects, this means that It can be shown that in this case, all effects of SNP m would have for γ m = 1 a t-distribution with v degrees of freedom, and are for γ m = 0 equal to zero. Moreover, the magnitude of the effect size would be similar for all effects of a given SNP m, which reduces the proportion of overdominant SNP. Developing a fast algorithm for such a model is an area for future research.

Model Choice
The most suitable model for a breeding program depends on the achievable accuracies for the breeding values of the selection candidates, and on the available data. Among the additive models, the parental model provided the least accurate predictions for CP, which is because the Mendelian sampling terms are part of the residual and can therefore not be utilized for prediction. It has, however, the advantage that the crossbred animals do not need to be genotyped and may therefore be suitable for animals with low economic value. The BSAM and ASGM models provided similar results in most cases. The BSAM model, however, needs the trace of the alleles from the purebred parent breed to the crossbred end product, which is a source of potential errors. This might even be more a problem when more complex crossbred structures are involved, e.g., three-or four-way crossbred data. Vandenplas et al. (2016) and Sevillano et al. (2016) developed a statistical pipeline for this purpose and applied it to a three-way crossbred pig data set.
The reviewed papers suggest that the dominance models provide more accurate genomic breeding values for CP than the additive models. Although Xiang et al. (2016b) showed that this gain in accuracy results in the case of normally distributed SNP effects almost solely from capturing inbreeding depression, this may be not the case when large marker panels and appropriate Bayesian models are used for evaluation. Dominance models have the additional advantage that breeding values for crossbred performance can be obtained from purebred animals, so phenotyping and genotyping crossbred individuals may not be necessary. However, as shown by Esfandyari et al. (2015b), the accuracy of the breeding values can be increased when phenotyped and genotyped crossbred individuals are included in the reference population.
Three different dominance models have been applied to crossbred data, which are the dominance model with lineindependent effects, the dominance model with line-dependent effects, and the dominance model with imprinting. The dominance model with line-dependent effects is likely to be inferior to the model with line-independent effects if the SNP effects of the different lines are falsely assumed to be statistically independent, the reference population is small, and the lines are closely related. This could be avoided by specifying a covariance between the SNP effects of the different lines.
When imprinting is relevant, then a dominance model with imprinting is of interest. For example, Jiang et al. (2017) found that there is substantial imprinting for reproduction traits in dairy cattle. The application of imprinting models requires that the crossbred animals are genotyped and that the alleles are traced from the parental lines to the crossbred animals. Unfortunately, to the best of our knowledge, these models are not well-analyzed yet. More research should be done in this area, which includes to analyze all models with common data sets.

CONCLUSION
Genomic models for crossbred data are of much higher complexity than models for purebred data, which results from the inclusion of dominance effects, breed-specific effects, imprinting effects, and from the joint evaluation of PP and CP. Although much research has already been done to develop genomic models for crossbred data, it can be expected that further progress can be made by developing statistical models that include all the different genetic effects in a single model, assume realistic covariance structures between the genetic effects of different breeding lines, use large marker panels, and assume realistic distributions for the SNP effects. The comparisons made in the reviewed papers are not sufficiently comprehensive to come to a clear recommendation as to which existing method is most suitable for a specific breeding program and a specific genetic trait architecture. Some papers suggested a superiority of dominance models. In the reviewed papers, the focus was on discontinuous crossbreeding schemes. This was because, to our best knowledge, no genomic models have been published that are specifically designed for continuous crossbreeding schemes.

AUTHOR CONTRIBUTIONS
RW did the formal model comparison. JS, RW, and JB wrote the paper. DH contributed to the writing. All authors contributed to the article and approved the submitted version.

FUNDING
This work was financially supported by the German Federal Ministry of Food and Agriculture (BMEL) through the Federal Office for Agriculture and Food (BLE), grant number 2817ERA10D. The project has received funding from the European Union's Horizon 2020 Research and Innovation Program under grant agreement no. 696231 -ReDiverse (Biodiversity within and between European Red Dairy Breeds). JS was partly supported by the H. Wilhelm Schaumann Foundation, Hamburg, Germany, which is gratefully acknowledged.

ACKNOWLEDGMENTS
The manuscript has benefitted from the critical and helpful comments of the reviewers.