Variance Component Decomposition for Growth Traits of the Bay Scallop (Argopecten irradians irradians)

The bay scallop (Argopecten irradians irradians) is one of the most important shellfish species in China. Since their introduction into China, only mass selection has been used in bay scallop breeding. With its gradual expansion and shortage of mate selection, population homozygosity increased, and fitness decreased. To investigate the effects of inbreeding and provide reference for improving breeding strategies and mating management, the variance components of the growth traits of the bay scallop were decomposed with genomic relationship matrices. The results indicated that the genetic variations in shell height and length were mainly accounted for by the additive effects. The genetic variation in shell width was mainly caused by dominance or dominance-by-dominance epistasis. The genetic variation in body weight was accounted for by dominance. No significant directional dominances were detected for all growth traits. Cross-validation for genomic prediction showed that including insignificant inbreeding in the genomic prediction model is not necessary, and we suggest that the genomic prediction model should be optimized with both likelihood ratio tests and cross-validation before utilization in practice.


INTRODUCTION
China is the largest shellfish producer globally (FAO, 2018). The bay scallop (Argopecten irradians irradians) was introduced into China from North America in 1982 (Zhang et al., 1986). Since then, the bay scallop has gradually expanded and dominated the scallop aquaculture because of its relatively faster growth than that of Chinese indigenous scallop species, such as the Zhikong scallop (Chlamys farreri) (Xiao et al., 2005;Guo and Luo, 2016). Upon introduction, the production of bay scallops was initially based primarily on hatchery-produced seed without systematic selection . Over time, mass selection began to be utilized in bay scallop breeding, and the growth traits of the scallops were genetically improved (Zheng et al., 2004;Wang et al., 2020).
However, the expansion of bay scallop aquaculture and shortage of mate selection has led to its remarkable decrease in survival rate in hatcheries in the last decade. This is thought to be a result of consecutive generations of inbreeding (Zheng et al., 2007, which increased the homozygosity of the population and decreased the fitness and trait values of this species (Charlesworth and Charlesworth, 1987). Inbreeding depression was also found in other species of scallop, such as the Catarina scallop (Argopecten circularis) (Ibarra et al., 1995) and Yesso scallop (Patinopecten yessoensis) (Li et al., 2007). Inbreeding also increased the segregation distortion of genetic markers (Wang et al., 2012).
For single-locus fitness traits, the decreased variance by inbreeding is due to dominance, and for multiple-locus fitness traits, this decreased variance is due to both dominance and epistasis (Falconer and Mackay, 1996). Theoretically, the loss of heterozygosity due to inbreeding often causes the loss of dominance and of further additive-by-dominance epistasis and dominance-by-dominance epistasis. However, it may also simultaneously create new additive-by-additive epistasis due to increased homozygosity.
With the use of genomic markers, homozygosity or inbreeding can be estimated more accurately than before and the genomic breeding values can also be predicted more accurately than when using traditional pedigree-based best linear unbiased prediction (BLUP) (Henderson, 1975;Meuwissen et al., 2001). With the use of genomic markers, it is easy to construct the genetic relationship matrices accurately, especially for dominance and epistasis, which are difficult to construct with pedigrees. Thus, genomic markers have facilitated the decomposition of variance components for the economic traits of shellfish as it is generally inconvenient to record pedigrees in practice.
Genetic variation comprises additive, dominance, and epistatic variance. In genomic selection (Meuwissen et al., 2001), genetic evaluations commonly use only additive models; whether a prediction model including non-additive effects can improve prediction accuracy is currently under debate (Gallardo et al., 2010;Wittenburg et al., 2011;Su et al., 2012;Munoz et al., 2014). The inbreeding coefficient can be incorporated into statistical models to fit the inbreeding effect (Joshi et al., 2018); however, whether this will improve the genomic prediction accuracy remains unclear. In this study, different genetic relationship matrices were constructed with genomic markers, the variance components for the growth traits of bay scallops were decomposed with different models, and the inbreeding effect on genomic prediction and the optimization of genomic prediction models were evaluated. The information gleaned from these analyses will be beneficial to further scallop breeding.

Materials
A total of 440 healthy 10-month-old bay scallops were collected from artificial scallop-rearing substrates at Qingdao Jinshatan Fishery Group Co. (Qingdao, Shandong Province, China). After sample collection, encrusted organisms on the scallop shells were removed and brought to the laboratory according to standard procedures (Maeda-Martıìnez et al., 2000). Four growth traits (shell height, length, and width and whole wet weight) were phenotyped. Shell height was measured as the distance from the hinge to the opposite end of the shell. Shell length was measured as the maximum dimension at right angles to the height. Shell width was measured as the greatest vertical distance between the two valves (Wang et al., 2018).

Genotypes
Genomic DNA was extracted from the gills using a standard phenol-trichloromethane method with minor modifications (Sambrook et al., 1989). Subsequently, libraries were constructed with the Multi-isoRAD method (Wang et al., 2016). In brief, the genomic DNA of the 440 selected bay scallop individuals was digested with BsaXI (New England BioLabs, cat. no. R0609) at 37 • C for 45 min. Five DNA fragments were set as one group, and five different pairs of adaptors were ligated with each tag using T4 DNA ligase, respectively, within groups. PCR amplification, digestion, and ligation were then performed thereafter (barcoding and pooling). After two successive amplifications, each production of five concatenated tags from five samples was purified via a MinElute PCR Purification Kit (Qiagen, cat. no. 28004) and pooled to run double-end sequencing on the Illumina Hiseq2000 system. Genotype calling was performed with RADtyping v1.5 software (Fu et al., 2013). Reads with no restriction sites, or those containing ambiguous base calls (N), long homopolymer regions (>10 bp), or excessive numbers of low-quality positions (>10 positions with quality of <20) were removed. A total of 487,336,153 pair of clean reads were generated for all the samples, and 7,610 single-nucleotide polymorphisms were obtained after quality control with the criteria of call rate > 80% and minor allele frequency >5%.

Variance Component Analysis
The variance components for growth traits were normally decomposed with six linear mixed models: additive (A); additive with inbreeding coefficient (AF); additive and dominance (AD); additive and dominance with inbreeding coefficient (ADF); additive, dominance, and epistasis (ADE); and additive, dominance, and epistasis with inbreeding coefficient (ADEF) as: Model AF: y = Xb + Z a u a + fh + e Model AD: y = Xb + Z a u a + Z d u d + e Model ADF: y = Xb + Z a u a + Z d u d + fh + e Model ADE: y = Xb + Z a u a + Z d u d + Z aa u aa + Z ad u ad + Z dd u dd + e Model ADEF: y = Xb + Z a u a + Z d u d + Z aa u aa + Z ad u ad + Z dd u dd + fh + e where y is the phenotypic value, b is the fixed effects, and X is its design matrix. u a , u d , u aa , u ad , and u dd , are the additive effect, dominant effect, additive-by-additive epistasis, additiveby-dominance epistasis, and dominance-by-dominance epistasis, respectively, and their corresponding design matrices are Z a , Z d , Z aa , Z ad , and Z dd , respectively. It was assumed that these genetic effects followed a normal distribution, , and u dd ∼ N(0, K dd σ 2 dd ), repectively. K a, K d , K aa , K ad , and K dd are the genetic relationship matrices of additive effect, dominant effect, additive-by-additive epistasis, additive-by-dominance epistasis, and dominance-by-dominance epistasis, respectively. σ 2 a , σ 2 d , σ 2 aa , σ 2 ad , and σ 2 dd are additive variance, dominant variance, additiveby-additive epistatic variance, additive-by-dominance epistatic variance, and dominance-by-dominance epistatic variance, respectively. h is homozygosity, f is the regression coefficients, e is the random error, and e ∼ N 0, Iσ 2 e , where I is an identity matrix.

Genetic Relationship Matrix
The additive genetic relationship matrix K a was calculated following the method of VanRaden (2008) as: where M is the n × m (n: number of individuals; m: number of markers) genotype matrix; the elements of the ith column in the M matrix are 0-2p i , 1-2p i , and 2-2p i for genotypes A 1 A 1 , A 1 A 2 , and A 2 A 2 , respectively; and p i is the allele frequency of A 2 .
The dominance genetic relationship matrix K d (Su et al., 2012) was constructed as: where H is the n × m (n: number of individuals; m: number of markers) genotype matrix, the elements of the ith column in the H matrix are 0-2 p i (1-p i ), 1-2p i (1-p i ), and 0-2p i (1-p i ) for genotypes A 1 A 1 , A 1 A 2 , and A 2 A 2 , respectively.
The three-epistasis genetic relationship matrices were approximated as: where # denotes the Hadamard product operation.
The significance tests for f in models AF, ADF, and ADEF were as follows: where C 22 corresponds to elements of f in the inverse of the left-hand side (LHS) of mixed-model equations (MME) of the corresponding prediction models. The significance tests for variance components between models were performed with the likelihood ratio (LR) test. As the test statistics followed a mixture of Chi-square distributions, the critical threshold values were as those referenced in Kodde and Palm (1986).
All the analyses with linear mixed modeling were performed with DMU software (Madsen and Jensen, 2002).

Cross-Validation
Five-fold cross-validation was performed to evaluate the reliabilities of genomic prediction with different models; the observation values of one-fifth of the individuals were randomly set as missing and were predicted with the others. The prediction reliability was measured as the correlation between predicted values and true observation values. The cross-validation was performed with 30 replicates.

Variance Component Decomposition
The average shell height, length, width, and weight were 59.6743, 63.1105, 2.6759 cm, and 39.5733 g, respectively, and their corresponding standard errors were 0.1677, 0.1682, 0.0094 cm, and 0.2762 g, respectively.
For shell height (Table 1), the likelihoods of models A, AD, and ADE were nearly the same, indicating that no variances of dominance and epistasis existed, and the genetic variation was mainly caused by additive effects. The f values in models AF, ADF, and ADEF were not significant, indicating that no significant inbreeding depression was directly a result of increased homozygosity.
For shell length (Table 1), a large portion of dominance variance was decomposed in models AD, ADF, and ADE, and a large portion of epistatic variance was decomposed in models ADE and AEDF. However, as for shell height, the comparison between the likelihoods of these models showed that dominance and epistatic variance components were not significant, likewise indicating that the main genetic variance was caused by additive effects.
For shell width (Table 1), when dominance was included in the model, the additive variance was nearly null, such as in the comparison between models A and AD or between models AF and ADF. When dominance-by-dominance epistasis was included in the model, both the additive and dominance variances were nearly null. However, the LR test (e.g., between models A and AD) was not larger than the threshold value 2.7 at a 0.05 significance level, and it is usually unrelated between genomic relationship matrices K a and K d ; therefore, it is necessary to evaluate model fitness carefully when optimizing statistical models for variance decomposition. As for shell height and length, neither significant inbreeding effect was found.
For body weight (Table 1), the comparison between models A and AD showed that significant dominance variance existed and accounted for greater than 90% of the genetic variance. The comparisons between models AD and ADE and between models ADF and ADEF showed that no significant epistatic variance accounted for the body weight. When additive-by-additive epistasis was included in the model, the additive variance decreased to 0 because of the strong correlation between genomic relationship matrices K a and K aa . Model AF showed that inbreeding effect had a significant influence on body weight; however, models ADF and ADEF showed no significant inbreeding effect on body weight. The comparison between models AF and ADF showed that no significant dominant variance accounted for body weight. The comparison between models A and AD was not consistent with the comparison between models AF and ADF, which also indicated the importance of model evaluation. In model name, letters A, D, E, and F mean additive effect, dominance, epistasis, and inbreeding coefficients, respectively. −2logL represents the −2 times log likelihood of the corresponding model; σ 2 a , σ 2 d , σ 2 aa , σ 2 ad , σ 2 dd , and σ 2 e are variance of additive, dominance, additive by additive, additive by dominance, dominance by dominance effects, and random error, respectively. f is regression coefficient of homozygosity, t is its t statistic, the numeric in the brackets are standard errors of the corresponding statics.

Genomic Prediction Reliability
The genomic prediction reliabilities ranged from 0.21 to 0.28 for most models of these four growth traits (Table 1), and the reliabilities of different models for the same traits differed within a narrow range. For shell height and length, model A had the greatest genomic prediction reliabilities, whereas for the shell width and weight, model ADE had the greatest prediction reliabilities. The models that included insignificant inbreeding did not improve the genomic prediction reliability compared with that of the corresponding models without inbreeding. Generally, models with a significant inbreeding effect improve genomic prediction reliability; however, model AF had a lower reliability than that of model A for body weight, even though the inbreeding effect was significant in model AF for body weight.

DISCUSSION
In models with homozygosity used in this study, f is usually used to fit directional dominance (Joshi et al., 2020). The insignificant regression coefficients for shell height, length, and width indicated that these three traits had no directional dominance. As no epistasis was found for shell height and length, the genetic variations for these two traits were caused by only additive effects. For shell width, the additive variance in model AD was nearly 0; hence, the model was equivalent to the model with only dominance variance components. Likewise, model ADE was equivalent to the model with only the dominanceby-dominance component for shell width. Models A, AD, and ADE had similar likelihoods, which differed within a very small range; therefore, it is difficult to determine the main sources of genetic variation. Genomic relation matrices K d and K dd were highly correlated ( Table 2), so it is reasonable to conclude that the likelihoods of models AD and ADE were quite close. Conversely, K a and K d are usually not highly correlated; however, the dominance component fit the model better than the additive component. For simplicity, it is very convenient to selectively breed using model A. However, given the likelihoods of these three models, it may be the most appropriate to fit the shell width with dominance-by-dominance epistasis. A dominant effect and dominance-by-dominance epistasis cannot be inherited by progeny, whereas an additive effect can; therefore, it is difficult to improve shell width through selection. One possible solution in practice would be to select different homozygous parents to commercially produce heterozygous progeny. Further research is required to determine the most appropriate model for shell width.
Dominance accounted for the highest component of genetic variation in body weight by the comparison between models A and AD, and only a small portion of genetic variance was accounted for by additive effects. Therefore, the model with only a dominance component (model D) was then analyzed The upper and lower triangular numerics were the correlation between diagonal and off-diagonal elements of the corresponding matrices, respectively. K a , K d , K aa , K ad , and K dd are the genomic relationship matrix for additive, dominance, additive by additive, additive by dominance, and dominance by dominance effects, respectively.
for body weight, and the results showed that −2logL was 159.6, indicating that the additive variance component was not significant. Due to the strong correlation between K a and K aa ( Table 2), when additive-by-additive epistasis and additive effects were simultaneously included in the model, the additive variance component was nearly 0. The f was significant in model AF for body weight, whereas it was not significant in models ADF and ADEF. Combined with the comparison between models D and AD, the directional dominance of body weight was not significant; thus, nearly all of the genetic variation came from dominance. The results showed the impossibility of improving body weight in the population. Therefore, one possible solution is to utilize the hybrid vigor between parents with relatively long genetic distances. Cross-validation is a commonly used and effective approach to study the robustness of a statistical model. In genomic prediction, cross-validation is frequently used to study the reliability of the prediction model with real data. Correlation between the likelihoods and reliabilities of the genomic prediction model is not required. The −2logL of models A, AD, and ADE for shell height was nearly the same, but the reliability of model A was the highest. Conversely, for shell length, the −2logL of models A, AD, and ADE gradually decreased, although the reliability of model A was still the highest. This is primarily because the dominance and epistasis components were not significant for these two traits.
However, for shell width and weight, when the −2logL of models A, AD, and ADE gradually decreased, their reliabilities increased, although the magnitudes of increase were small ( Table 1). For shell width, models AD and ADE can be simplified as models with only dominance and only dominance-dominance epistasis, respectively. Therefore, it is not necessary to test the significance of each variance component near 0 in the corresponding models. Combining the reliability and −2logL, the models AD (or model D) and ADE (or model with only dominance-by-dominance epistasis) were well-fitted to the data. Due to the high correlation between K d and K dd ( Table 2), both models can be used to predict shell width. For body weight, the dominance component clearly accounted for a large portion of genetic variation. Therefore, the model with only dominance was subsequently performed, showing that the additive effect was not significant by comparison to that in model AD. Cross-validation showed that the model with only dominance had the highest reliability of 0.2773 for body weight.
Although traditional family selection has been utilized for generations of breeding, the bay scallop genetic diversity decreased dramatically in recent years (Zheng et al., 2007) due to a shortage of systematic breeding and mating management. The genetic diversity may easily decrease or even vanish because breeding only a few parents has been sufficient for farming high-fecundity species such as the bay scallop (Gjerde, 1986). Consequently, additive genetic variation is very easily lost, resulting in no potential genetic gain by selection.
Model A was a frequently used model for genomic prediction because the genomic breeding values were additive effects. Dominance and epistasis were sometimes incorporated into the genomic prediction models; however, it was relatively rarely reported. Based on the results of this study, we strongly suggest optimizing models for genomic prediction before using them in practice. The models should be as simple as possible, using likelihood ratio tests, and the model fitness should be evaluated by both likelihood and cross-validation. Such model optimization is beneficial for formulating breeding strategies.

AUTHOR CONTRIBUTIONS
HL and ZB designed the experiment. YW, QX, and LZ collected and measured the samples. QZ and XH performed the laboratory work. HL and YZ analyzed the data. HL, QX, and XH wrote the manuscript. All authors contributed to the article and approved the submitted version.