<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article article-type="research-article" dtd-version="2.3" xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title>Frontiers in Genetics</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Genet.</abbrev-journal-title>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">896774</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2022.896774</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Use of DNA pools of a reference population for genomic selection of a binary trait in Atlantic salmon</article-title>
<alt-title alt-title-type="left-running-head">Dagnachew et al.</alt-title>
<alt-title alt-title-type="right-running-head">
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fgene.2022.896774">10.3389/fgene.2022.896774</ext-link>
</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Dagnachew</surname>
<given-names>Binyam</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="corresp" rid="c001">&#x2a;</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1672335/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Aslam</surname>
<given-names>Muhammad Luqman</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/522510/overview"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hillestad</surname>
<given-names>Borghild</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Meuwissen</surname>
<given-names>Theo</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sonesson</surname>
<given-names>Anna</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/180978/overview"/>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Fisheries and Aquaculture Research</institution>, <institution>Nofima AS&#x2014;Norwegian Institute of Food</institution>, <addr-line>Troms&#xf8;</addr-line>, <country>Norway</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Benchmark Genetics</institution>, <addr-line>Bergen</addr-line>, <country>Norway</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>Norwegian University of Life Sciences</institution>, <addr-line>As</addr-line>, <country>Norway</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>
<bold>Edited by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/468618/overview">Alexandre Wagner Silva Hilsdorf</ext-link>, University of Mogi das Cruzes, Brazil</p>
</fn>
<fn fn-type="edited-by">
<p>
<bold>Reviewed by:</bold> <ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/178186/overview">Hugo H Montaldo</ext-link>, National Autonomous University of Mexico, Mexico</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://loop.frontiersin.org/people/339104/overview">Timothy D. Leeds</ext-link>, United States Department of Agriculture (USDA), United States</p>
</fn>
<corresp id="c001">&#x2a;Correspondence: Binyam Dagnachew, <email>Binyam.dagnachew@nofima.no</email>
</corresp>
<fn fn-type="other">
<p>This article was submitted to Livestock Genomics, a section of the journal Frontiers in Genetics.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>26</day>
<month>08</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>896774</elocation-id>
<history>
<date date-type="received">
<day>15</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>07</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Dagnachew, Aslam, Hillestad, Meuwissen and Sonesson.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Dagnachew, Aslam, Hillestad, Meuwissen and Sonesson</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>Genomic selection has a great potential in aquaculture breeding since many important traits are not directly measured on the candidates themselves. However, its implementation has been hindered by staggering genotyping costs because of many individual genotypes. In this study, we explored the potential of DNA pooling for creating a reference population as a tool for genomic selection of a binary trait. Two datasets from the SalmoBreed population challenged with salmonid alphavirus, which causes pancreas disease, were used. Dataset-1, that includes 855 individuals (478 survivors and 377 dead), was used to develop four DNA pool samples (i.e., 2 pools each for dead and survival). Dataset-2 includes 914 individuals (435 survivors and 479 dead) belonging to 65 full-sibling families and was used to develop in-silico DNA pools. SNP effects from the pool data were calculated based on allele frequencies estimated from the pools and used to calculate genomic breeding values (GEBVs). The correlation between SNP effects estimated based on individual genotypes and pooled data increased from 0.3 to 0.912 when the number of pools increased from 1 to 200. A similar trend was also observed for the correlation between GEBVs, which increased from 0.84 to 0.976, as the number of pools per phenotype increased from 1 to 200. For dataset-1, the accuracy of prediction was 0.71 and 0.70 when the DNA pools were sequenced in 40&#xd7; and 20&#xd7;, respectively, compared to an accuracy of 0.73 for the SNP chip genotypes. For dataset-2, the accuracy of prediction increased from 0.574 to 0.691 when the number of in-silico DNA pools increased from 1 to 200. For this dataset, the accuracy of prediction using individual genotypes was 0.712. A limited effect of sequencing depth on the correlation of GEBVs and prediction accuracy was observed. Results showed that a large number of pools are required to achieve as good prediction as individual genotypes; however, alternative effective pooling strategies should be studied to reduce the number of pools without reducing the prediction power. Nevertheless, it is demonstrated that pooling of a reference population can be used as a tool to optimize between cost and accuracy of selection.</p>
</abstract>
<kwd-group>
<kwd>genomic selection</kwd>
<kwd>DNA pooling</kwd>
<kwd>reference population</kwd>
<kwd>salmon</kwd>
<kwd>in-silico</kwd>
</kwd-group>
<contract-sponsor id="cn001">European Commission<named-content content-type="fundref-id">10.13039/501100000780</named-content>
</contract-sponsor>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1 Introduction</title>
<p>Genomic selection (GS) is becoming a practical and effective breeding tool for many livestock species because of the rapid development of high-throughput genotyping technologies that reduce genotyping costs. The potential application of GS for aquaculture species has been studied (<xref ref-type="bibr" rid="B19">Nielsen et al., 2009</xref>; <xref ref-type="bibr" rid="B28">Sonesson and Meuwissen, 2009</xref>), and it showed an increase in genetic gain especially for traits that are difficult to improve by traditional selection such as disease resistance. Therefore, it is of particular interest in aquaculture species as most breeding goal traits in these species are measured on sibs of the selection candidates.</p>
<p>For a conventional GS breeding program, two large datasets are required: a training set (reference population) with genotyped and phenotyped individuals and a prediction set (selection candidates) containing only genotyped individuals (<xref ref-type="bibr" rid="B18">Meuwissen et al., 2001</xref>; <xref ref-type="bibr" rid="B6">Goddard and Hayes, 2007</xref>). The sizes of these datasets determine the rate of genetic improvement through influencing components of the equation. On one hand, the prediction accuracy relies on the size of the training set to estimate parameters (i.e., marker effects) and the marker density at which reference individuals are genotyped. On the other hand, the size of the prediction set determines the selection intensity and consequently the response to selection. However, increasing either the training set or the prediction set increases the cost of GS programs.</p>
<p>Even though the genotyping cost per individual is reducing, implementation of conventional GS is expensive in aquaculture compared to other livestock species because of the fact that the number of selection candidates and their siblings to genotype is large. Therefore, it is of interest to reduce either the number of individuals or the number of markers to genotype without reducing the prediction accuracy significantly. Strategies for reducing the number of markers have been described in many studies (<xref ref-type="bibr" rid="B17">Lillehammer et al., 2013</xref>; <xref ref-type="bibr" rid="B22">&#xd8;deg&#xe5;rd and Meuwissen, 2014</xref>; <xref ref-type="bibr" rid="B2">Dagnachew and Meuwissen, 2019</xref>; <xref ref-type="bibr" rid="B29">Tsairidou et al., 2020</xref>). This study investigates the impact of reducing the number of individuals to genotype by pooling DNA samples from a reference population.</p>
<p>Pooling of DNA samples and sequencing have provided a cost-effective alternative for a wide range of genomic applications, such as population genetics (<xref ref-type="bibr" rid="B5">Gautier et al., 2013</xref>), genome-wide association studies (<xref ref-type="bibr" rid="B25">Sham et al., 2002</xref>), and estimation of SNP effects for quantitative traits (<xref ref-type="bibr" rid="B8">Henshall et al., 2012</xref>; <xref ref-type="bibr" rid="B1">Bell et al., 2017</xref>). In a theoretical way, estimation of marker effects from a pooled DNA differs from standard individual genotypes in some aspects. First, in pooled DNA samples, only marker allele frequencies can be estimated, whereas in standard genotyping, individual marker genotypes are obtained. Second, in pooled DNA samples, marker allele frequencies will normally be estimated with some degree of technical error, unequal contribution of sequenced reads derived among the individuals in a specific pool. Third, for DNA pools, the quantitative trait value of each individual cannot be assigned to a particular marker genotype, since information is not available on individual genotypes. However, by using the allelic frequencies at each tail to estimate the respective genotype frequencies and by assigning the sample average at each tail to every individual at that tail, the problems raised by the above differences can be overcome (<xref ref-type="bibr" rid="B8">Henshall et al., 2012</xref>; <xref ref-type="bibr" rid="B5">Gautier et al., 2013</xref>).</p>
<p>
<xref ref-type="bibr" rid="B27">Sonesson et al. (2010)</xref> studied the use of DNA pooling of test individuals in combination with communal rearing of families as a means of reducing genotyping costs in aquaculture GS schemes using simulation. The study reported up to 0.88 accuracy of selection depending on the number of test individuals and the number of markers. However, to date, the potential of DNA pooling of a reference population for genomic prediction using data from practical breeding work is lacking. Therefore, in this study, we demonstrate the potential of DNA pooling for GS using both pooled DNA samples and <italic>in-silico</italic> DNA pooling. The effect of the number of pools and sequencing depth on the selection accuracy is studied. The study uses datasets generated for a breeding work to improve resistance against pancreas disease (PD).</p>
</sec>
<sec id="s2">
<title>2 Materials and methods</title>
<sec id="s2-1">
<title>2.1 Datasets</title>
<p>Two datasets from the SalmoBreed breeding population of two year-classes (YC) were used for this study. Dataset-1 was from YC 2013, and dataset-2 was from YC 2015. The trait of interest was resistance to PD. PD is currently among the most economically important diseases in Norwegian Atlantic salmon production. It is caused by a salmonid alphavirus (SAV) of which SAV2 and SAV3 variants are found in Norway. These datasets were generated as part of the SalmoBreed&#x2019;s annual challenge test for the practical breeding work to improve host resistance against PD. The first dataset was used for real DNA sample pooling and the other was used for <italic>in-silico</italic> pooling.</p>
<sec id="s2-1-1">
<title>2.1.1 Dataset-1: DNA pooling of samples</title>
<p>Data and tissue samples on 5,223 postsmolts belonging to 273 full-sib families from the SalmoBreed elite population challenged with SAV3 were available. The mortality profiles for this dataset are presented in <xref ref-type="fig" rid="F1">Figures 1A and C</xref>. Mortalities started 5&#xa0;days post challenge and ended 25&#xa0;days post challenge with the peak mortality observed at 12&#xa0;days post challenge (<xref ref-type="fig" rid="F1">Figure 1A</xref>). A histogram of the number of full-sibling families across the 273 full-sib families with a given percentage of mortality is plotted in <xref ref-type="fig" rid="F1">Figure 1C</xref>. The full-sibling mortality rate ranged from 0 to 100%, with an average mortality rate of 67% (<xref ref-type="fig" rid="F1">Figure 1C</xref>). The individuals could be selected and grouped using survival information (early dead and/or late survivors) from the challenge test. Hence, 855 individuals were selected to develop four pools (i.e., M1, M2, S1, and S2 with the initial &#x201c;M&#x201d; representing the pool of mortalities/dead and &#x201c;S&#x201d; denoting the pool of surviving individuals).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption>
<p>Mortality rate profiles of the datasets: <bold>(A)</bold> the number of mortalities observed per day over the course of the challenge trial (29&#xa0;days) for dataset-1. Mortalities started 5&#xa0;days post challenge and ended 25&#xa0;days post challenge with the peak mortality observed at 12&#xa0;days post challenge. <bold>(B)</bold> the number of mortalities observed per day over the course of the challenge trial (56&#xa0;days) for dataset-2. Mortalities started 7&#xa0;days post challenge and ended 21&#xa0;days post challenge with the peak mortality observed at 13&#xa0;days post challenge. <bold>(C)</bold> dataset-1&#x2014;the number of full-sib families across the 273 full-sib families with a given percentage mortality. The mortality rate ranged from 0% to 100%, with an average mortality rate of 67%. <bold>(D)</bold> dataset-2&#x2014;the number of full-sib families across the 282 full-sib families with a given percentage mortality. The mortality rate ranged from 0% to 100%, with an average mortality rate of 48%.</p>
</caption>
<graphic xlink:href="fgene-13-896774-g001.tif"/>
</fig>
<p>DNA was extracted from each selected individual, quantified using the Quant-iT PicoGreen dsDNA assay kit, and normalized to a standard concentration for every individual; subsequently, equal quantities/volumes of DNA from each individual were pooled to make a specific pool group (M1, M2, S1, and S2). The M1 and M2 pools contained DNA representing 173 and 204 mortalities, respectively, while pools S1 and S2 incorporated DNA from 205 and 273 surviving individuals, respectively. The dead individuals of pools M1 and M2 represented 28 families, while pools S1 and S2 were represented by 35 and 34 families, respectively.</p>
<p>Libraries were prepared for sequencing using the Illumina PCR-free genomic DNA sample prep kits which were sequenced to approximately 40&#xd7; depth. The sequencing was performed with an Illumina NextGen 500 instrument to obtain paired-end sequence reads of 150&#xa0;bp. Trimmomatic software was used to perform adaptor and quality trimming of the generated sequence reads, and subsequently, high-quality sequence data were aligned to the Atlantic salmon genome reference sequence (assembly ICSASG_v2) using BWA-MEM version: 0.7.13-r1126 (<xref ref-type="bibr" rid="B15">Li, 2013</xref>). SNP detection, genotype calling, and allele frequencies on each locus were obtained using SAMtools version: 1.2. Software (<xref ref-type="bibr" rid="B16">Li et al., 2009</xref>). All the individuals pooled into four pools (M1, M2, S1, and S2) were also individually genotyped using &#x223c;57&#xa0;K axiom Affymetrix SNP Genotyping Array (NOFSAL2). The overlapping SNPs across genotyping methods (sequencing vs. axiom array) were identified which yielded 45812 SNPs in common that were used for further genomic analyses.</p>
</sec>
<sec id="s2-1-2">
<title>2.1.2 Dataset-2: <italic>In-silico</italic> DNA pooling</title>
<p>Data from 4,115 postsmolts belonging to 282 full-sib families from the SalmoBreed elite population challenged with SAV3 were available. The mortality profiles for this dataset are presented in <xref ref-type="fig" rid="F1">Figures 1B and D</xref>. Mortalities started 7&#xa0;days post challenge and ended 21&#xa0;days post challenge with peak mortality observed at 13&#xa0;days post challenge (<xref ref-type="fig" rid="F1">Figure 1B</xref>). A histogram of the number of full-sibling families across the 282 full-sib families with a given percentage of mortality is plotted in <xref ref-type="fig" rid="F1">Figure 1D</xref>. The full-sibling mortality rate ranged from 0 to 100%, with an average mortality rate of 48% (<xref ref-type="fig" rid="F1">Figure 1D</xref>). From the dataset, 914 individuals (435 survivors and 479 dead), belonging to 65 full-sib families, were selected based on family-wide mortality rates. The dead individuals represented 58 families, while the survived individuals represented 60 families. The data were split into a reference set (589 samples: 308 survivors and 281 dead) and a validation set (325 samples: 127 survivors and 198 dead). Splitting of individuals into reference and validation sets was performed randomly within a full-sib family; hence, each family was represented in both datasets.</p>
<p>Genotypes for the customized NOFSAL2 SNP array with &#x223c;57&#xa0;K SNPs were available for each individual. Pooled genotypes were generated in-silico as the frequency of alleles of the individuals in the pools and no random error was added to the pool genotypes.</p>
</sec>
</sec>
<sec id="s2-2">
<title>2.2 Calculation of SNP allele frequencies</title>
<sec id="s2-2-1">
<title>2.2.1 Allele frequency in DNA pools (dataset-1)</title>
<p>The pools were sequenced on an average sequencing depth of &#x223c;40&#xd7; and the allele frequencies from each pool (S1, S2, M1, and M2) were obtained using a customized script implemented with <italic>bcftools</italic>, version: 1.10.2 (<xref ref-type="bibr" rid="B3">Danecek et al., 2021</xref>). The observed total allelic depth/frequency for each discovered variant in each pool was obtained using &#x201c;<italic>INFO/AD</italic>&#x201d; and the expression &#x201c;<italic>FORMAT/AD</italic>&#x201d; was used to obtain the observed frequency of the alternative allele. Moreover, the sequencing depth of each pool was reduced to 20&#xd7; by random sampling of sequence reads followed by alignments, variants calling, and estimates of frequencies for both reference and nonreference alleles. The objective to reduce the sequencing depth per pool was to test the effect on the accuracy of prediction when the sequencing depth is reduced.</p>
</sec>
<sec id="s2-2-2">
<title>2.2.2 Allele frequency in <italic>in-silico</italic> DNA pooling (dataset-2)</title>
<p>The reference set was used for the calculation of allele frequencies by in-silico pooling of the genotypes of individuals in this set. They were pooled into different numbers of pools (i.e., 1, 2, 4, 10, 20, 40, 100, 150, and 200) per phenotype group (i.e., dead and survivors). Summaries of the average number of individuals and families per pool and per phenotype group are presented in <xref ref-type="table" rid="T1">Table 1</xref>. Allele frequencies from each pool were calculated by sampling with replacement given the individual genotypes. The number of times the sampling with replacement was performed is related to the average sequence depth, and it was done 20&#xd7;, 40&#xd7;, and 100&#xd7;. This process was repeated independently 60 times.</p>
<table-wrap id="T1" position="float">
<label>TABLE 1</label>
<caption>
<p>Summary of the number of pools and the average number of individuals and families per pool per phenotype group.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">&#x23; Pools</th>
<th colspan="2" align="left">Dead pool</th>
<th colspan="2" align="left">Alive pool</th>
</tr>
<tr>
<th align="left">&#x23; Fish/pool</th>
<th align="left">&#x23; Family/pool</th>
<th align="left">&#x23; Fish/pool</th>
<th align="left">&#x23; Family/pool</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td align="left">281</td>
<td align="left">58</td>
<td align="left">308</td>
<td align="left">60</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">140.5</td>
<td align="left">49.5</td>
<td align="left">154</td>
<td align="left">56</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">70.25</td>
<td align="left">37.75</td>
<td align="left">77</td>
<td align="left">42.5</td>
</tr>
<tr>
<td align="left">10</td>
<td align="left">28.1</td>
<td align="left">22.2</td>
<td align="left">30.8</td>
<td align="left">24.6</td>
</tr>
<tr>
<td align="left">20</td>
<td align="left">14.05</td>
<td align="left">12.35</td>
<td align="left">15.4</td>
<td align="left">13.7</td>
</tr>
<tr>
<td align="left">40</td>
<td align="left">8.3</td>
<td align="left">7.7</td>
<td align="left">9.22</td>
<td align="left">7.65</td>
</tr>
<tr>
<td align="left">100</td>
<td align="left">2.81</td>
<td align="left">2.76</td>
<td align="left">3.08</td>
<td align="left">3.03</td>
</tr>
<tr>
<td align="left">150</td>
<td align="left">1.87</td>
<td align="left">1.85</td>
<td align="left">2.05</td>
<td align="left">2.05</td>
</tr>
<tr>
<td align="left">200</td>
<td align="left">1.41</td>
<td align="left">1.4</td>
<td align="left">1.54</td>
<td align="left">1.52</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s2-3">
<title>2.3 Estimation of marker effects</title>
<p>The main difference between individual genotypes and pooled DNA regarding estimation of marker effects is that for the latter, a trait value of individuals cannot be assigned to a particular marker genotype exclusively. Therefore, marker effects are to be estimated from marker allele frequencies calculated from pooled DNA.</p>
<p>For the individual genotypes, marker effects were estimated by fitting the marker-based genomic model (SNP-BLUP):<disp-formula id="equ1">
<mml:math id="m1">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>X</mml:mi>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>where <inline-formula id="inf1">
<mml:math id="m2">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of phenotypes for the trait, <inline-formula id="inf2">
<mml:math id="m3">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is the overall mean, <inline-formula id="inf3">
<mml:math id="m4">
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula> is the vector of 1s, <italic>X</italic> is the matrix of genotypes dosage for all SNP coded as 0,1, and 2 and for all animals, <inline-formula id="inf4">
<mml:math id="m5">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of marker effects and it is assumed that <inline-formula id="inf5">
<mml:math id="m6">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf6">
<mml:math id="m7">
<mml:mrow>
<mml:mo>&#xa0;</mml:mo>
<mml:mi>e</mml:mi>
<mml:mo>&#xa0;</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of random residual and assumed <inline-formula id="inf7">
<mml:math id="m8">
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>e</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> where <inline-formula id="inf8">
<mml:math id="m9">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>m</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the variance of marker effects, <inline-formula id="inf9">
<mml:math id="m10">
<mml:mi>I</mml:mi>
</mml:math>
</inline-formula> is the identity matrix, and <inline-formula id="inf10">
<mml:math id="m11">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mi>e</mml:mi>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the residual error variance.</p>
<p>For the pools, the marker effects are estimated by fitting a slightly modified marker-based genomic model:<disp-formula id="equ2">
<mml:math id="m12">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:mn>1</mml:mn>
<mml:mi>&#x3bc;</mml:mi>
<mml:mo>&#x2b;</mml:mo>
<mml:mi>Z</mml:mi>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x2b;</mml:mo>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>where <inline-formula id="inf11">
<mml:math id="m13">
<mml:mrow>
<mml:msub>
<mml:mi>y</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of phenotypes of the pools for the trait, <inline-formula id="inf12">
<mml:math id="m14">
<mml:mi>&#x3bc;</mml:mi>
</mml:math>
</inline-formula> is the overall mean, <inline-formula id="inf13">
<mml:math id="m15">
<mml:mn>1</mml:mn>
</mml:math>
</inline-formula> is the vector of 1s, <italic>Z</italic> is the matrix of average allele frequencies for all SNPs and for all pools, <inline-formula id="inf14">
<mml:math id="m16">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of marker effects estimated from allele frequencies of pools, and <inline-formula id="inf15">
<mml:math id="m17">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of random residual. It is assumed that <inline-formula id="inf16">
<mml:math id="m18">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula> and <inline-formula id="inf17">
<mml:math id="m19">
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>&#x223c;</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>I</mml:mi>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>, where <inline-formula id="inf18">
<mml:math id="m20">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the variance of marker effects estimated from the marker frequencies of the DNA pools and <inline-formula id="inf19">
<mml:math id="m21">
<mml:mrow>
<mml:msubsup>
<mml:mi>&#x3c3;</mml:mi>
<mml:mrow>
<mml:mi>e</mml:mi>
<mml:mo>&#x2212;</mml:mo>
<mml:mi>p</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula> is the residual error variance. The analyses for both models were done using singular value decomposition based SNP-BLUP (<xref ref-type="bibr" rid="B21">&#xd8;deg&#xe5;rd et al., 2018</xref>), which is suitable for large-scale genomic predictions.</p>
</sec>
<sec id="s2-4">
<title>2.4 Estimation of genomic breeding values</title>
<p>Estimation of genomic breeding values (GEBVs) for the selection candidates was performed by summing the effects of the markers multiplied by the standardized genotypes. Two GEBV values per individual were predicted using SNP effects estimated using pool data and the individual genotypes.<disp-formula id="equ3">
<mml:math id="m22">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>B</mml:mi>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:mo>&#x3d;</mml:mo>
<mml:munderover>
<mml:mstyle displaystyle="true">
<mml:mo>&#x2211;</mml:mo>
</mml:mstyle>
<mml:mi>i</mml:mi>
<mml:mi>m</mml:mi>
</mml:munderover>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</disp-formula>where <inline-formula id="inf20">
<mml:math id="m23">
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>B</mml:mi>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mi>j</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the vector of predicted GEBVs for individual j, <inline-formula id="inf21">
<mml:math id="m24">
<mml:mrow>
<mml:msub>
<mml:mi>X</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the standardized genotypes for individual <italic>j</italic>, and <inline-formula id="inf22">
<mml:math id="m25">
<mml:mrow>
<mml:msub>
<mml:mi>b</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula> is the calculated marker effects either from pools of samples or individual genotypes.</p>
</sec>
<sec id="s2-5">
<title>2.5 Model evaluation</title>
<p>For dataset-1 DNA pooling of samples, the effect of sequencing depth on the accuracy of selection was studied by varying the sampling from originally available 40&#xd7; to 20&#xd7;. SNP effects from the pool data were calculated based on allele frequencies estimated from the pools and used to calculate GEBVs as described in the estimation of GEBV section.</p>
<p>For dataset-2 in-silico DNA pooling, the effect of sequence coverage on the accuracy of selection was studied by varying the sampling times to 40&#xd7; and 100&#xd7;. SNP effects from the pool data were calculated based on allele frequencies estimated from the pools and compared with SNP effects estimated from individual genotypes.</p>
<p>The accuracy of selection was calculated as the correlation between predicted GEBVs and phenotypes and weighted by the square root of the heritability (<italic>h</italic>
<sup>2</sup> &#x3d; 0.3).<disp-formula id="equ4">
<mml:math id="m26">
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>y</mml:mi>
<mml:mo>&#xa0;</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>&#x3d;</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>&#x3c1;</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>G</mml:mi>
<mml:mi>E</mml:mi>
<mml:mi>B</mml:mi>
<mml:mi>V</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>y</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:msup>
<mml:mi>h</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>Where <inline-formula id="inf23">
<mml:math id="m27">
<mml:mi>&#x3c1;</mml:mi>
</mml:math>
</inline-formula> is the Pearson moment correlation coefficient, <italic>GEBV</italic> is the estimated GEBVs, <inline-formula id="inf24">
<mml:math id="m28">
<mml:mi>y</mml:mi>
</mml:math>
</inline-formula> is the adjusted phenotype, and <italic>h</italic>
<sup>2</sup> is the heritability of the trait.</p>
</sec>
</sec>
<sec id="s3">
<title>3 Results</title>
<sec id="s3-1">
<title>3.1 Accuracy of allele frequency estimation</title>
<p>Sequencing of DNA pools from individuals gives estimates of allele frequencies at SNPs. For dataset-1, the observed number of alleles at each locus for the sequenced pools provided the estimates of allele frequencies. The accuracies of marker allele frequency calculated from the in-silico DNA pools using three sequencing depths (i.e., 20&#xd7;, 40&#xd7;, and 100&#xd7;) are presented in <xref ref-type="fig" rid="F2">Figure 2</xref>. The accuracies were calculated as the Pearson correlation coefficients between the true allele frequencies (i.e., calculated from the individual genotypes) and frequencies calculated using in-silico pools. The figure shows that the accuracy of allele frequency estimation is affected by the number of pools and the average sequence depth coverage. As the number of pools increased from one pool, where all individuals represent one pool, to the maximum number of pools where each individual denotes a pool, the accuracy of allele frequency calculation has improved significantly, especially with low average sequence coverage (<xref ref-type="fig" rid="F2">Figure 2</xref>). For example, for the sequencing coverage 40&#xd7;, the Pearson correlation coefficients between DNA pool- and individual-based allele frequency estimation increased from 0.892 when only one DNA pool is considered to 0.99 when 10 DNA pools are used (<xref ref-type="fig" rid="F2">Figure 2</xref>). Similar trends were also observed for the different sequencing coverages (<xref ref-type="fig" rid="F2">Figure 2</xref>). As can be also seen from the figure, increasing sequence coverage depth also improved the accuracy of allele frequency estimation, particularly when the number of pools is small (<xref ref-type="fig" rid="F2">Figure 2</xref>). When a single pool per individual was used (i.e., no. of pools &#x3d; 914), the correlation between allele frequencies calculated using individual genotypes and the pool is equal to 1, regardless of the sequencing coverage.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption>
<p>Accuracy of marker allele frequency calculated from different numbers of in-silico DNA pools and sequencing depths. Accuracy was calculated as the Pearson correlation coefficient between the true allele frequencies (&#x201c;Tfreq&#x201d;) (i.e., calculated from the individual genotypes) and frequencies calculated using in-silico DNA pools (&#x201c;Pfreq&#x201d;) (<italic>n</italic> &#x3d; 1, 2, 4, 10, 100, and 914). There were 914 individuals, and the maximum number of pools (<italic>n</italic> &#x3d; 914) represents one individual per pool.</p>
</caption>
<graphic xlink:href="fgene-13-896774-g002.tif"/>
</fig>
<p>Differences in SNP frequencies between survivor and dead in-silico DNA pools are presented in <xref ref-type="fig" rid="F3">Figure 3</xref>. The plotted allele frequency differences in the figure are absolute values of the frequency differences calculated from 100 pools per phenotype group. It shows that there are larger differences in frequencies between dead and survivor pools for the SNPs located at chromosome 3.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption>
<p>Manhattan plot of allele frequency differences between alive and dead pool for each SNP. The plotted allele frequency differences were calculated from 100 pools per phenotype group. Chromosome 30 represents markers belonging to unknown chromosome(s).</p>
</caption>
<graphic xlink:href="fgene-13-896774-g003.tif"/>
</fig>
</sec>
<sec id="s3-2">
<title>3.2 Correlation between SNP effects</title>
<p>The correlations between SNP effects estimated from individual genotypes and based on allele frequencies calculated from the in-silico DNA pools are presented in <xref ref-type="fig" rid="F4">Figure 4</xref>. The correlation increased from 0.3 to 0.898 and from 0.311 to 0.912 for the 40&#xd7; and 100&#xd7; sequence coverage respectively, when the number of pools increased from 1 to 200 per phenotype group (<xref ref-type="fig" rid="F4">Figure 4</xref>). Overall, it was observed that the impact of sequencing coverage on the correlation between SNP effects is limited; however, its importance increases when the number of pools is decreasing. Exception from the general trend was observed when 1 pool per phenotype group was used, where no difference in SNP effects correlation was observed for 40&#xd7; and 100&#xd7; (<xref ref-type="fig" rid="F4">Figure 4</xref>).</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption>
<p>Correlation between SNP effects estimated from individual genotypes and in-silico DNA pool genotypes.</p>
</caption>
<graphic xlink:href="fgene-13-896774-g004.tif"/>
</fig>
</sec>
<sec id="s3-3">
<title>3.3 Correlation between genomic breeding values</title>
<p>Two sets of GEBVs for the validation dataset were calculated based on SNP effects estimated from individual genotypes and in-silico pooled DNA. Pearson correlation coefficients between these two sets of GEBVs are presented in <xref ref-type="fig" rid="F5">Figure 5</xref>. The figure shows that the number of pools is the determining factor as the correlation increased from 0.84, when only a single pool is used per phenotype, to 0.976 when the number of pools increased to 200 for 40&#xd7; sequencing coverage (<xref ref-type="fig" rid="F5">Figure 5</xref>). These correlations are barely affected by the sequence coverage.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption>
<p>Correlation between genomic breeding values (GEBV) estimated from individual genotypes and in-silico DNA pool genotypes.</p>
</caption>
<graphic xlink:href="fgene-13-896774-g005.tif"/>
</fig>
</sec>
<sec id="s3-4">
<title>3.4 Genomic prediction accuracy</title>
<p>For the validation dataset in dataset-2&#x2013;in-silico DNA pooling, genomic prediction accuracies were calculated as the Pearson correlation coefficients between true phenotypes and predicted GEBVs and weighted by the inverse of the square root of heritability of PD (<xref ref-type="table" rid="T2">Table 2</xref>). The result showed that the prediction accuracy for the individual genotypes was 0.712 and for the in-silico DNA pools, it ranged from 0.574 to 0.687 when the number of pools increased from 1 to 100. <xref ref-type="table" rid="T2">Table 2</xref> also presents % decreased, the decline in accuracy for the DNA pools compared to the individual genotypes. Regardless of the sequencing coverage, approximately a 20% decline in accuracy was observed for in-silico DNA pools when less than 10 pools per group were used. However, the loss in accuracy was reduced to less than 10% when 100 pools per phenotype group were used. Furthermore, less than a 4% loss in accuracy was observed when up to 200 pools were used. The difference between 40&#xd7; and 100&#xd7; sequencing coverage with respect to prediction accuracy was limited (<xref ref-type="table" rid="T2">Table 2</xref>).</p>
<table-wrap id="T2" position="float">
<label>TABLE 2</label>
<caption>
<p>Accuracy of selection of dataset-2&#x2013;in-silico DNA pooling.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th rowspan="2" align="left">No. of pools</th>
<th colspan="3" align="left">Accuracy of prediction</th>
<th colspan="2" align="left">% Decreased</th>
</tr>
<tr>
<th align="center">Individual</th>
<th align="center">40&#xd7;</th>
<th align="center">100&#xd7;</th>
<th align="center">40&#xd7;</th>
<th align="center">100&#xd7;</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">1</td>
<td rowspan="9" align="char" char="plusmn">0.712 &#xb1; 0.005</td>
<td align="char" char="plusmn">0.574 &#xb1; 0.006</td>
<td align="char" char="plusmn">0.575 &#xb1; 0.006</td>
<td align="char" char=".">19.38</td>
<td align="char" char=".">19.24</td>
</tr>
<tr>
<td align="left">2</td>
<td align="char" char="plusmn">0.574 &#xb1; 0.004</td>
<td align="char" char="plusmn">0.575 &#xb1; 0.002</td>
<td align="char" char=".">19.382</td>
<td align="char" char=".">19.24</td>
</tr>
<tr>
<td align="left">4</td>
<td align="char" char="plusmn">0.575 &#xb1; 0.003</td>
<td align="char" char="plusmn">0.575 &#xb1; 0.002</td>
<td align="char" char=".">19.242</td>
<td align="char" char=".">19.24</td>
</tr>
<tr>
<td align="left">10</td>
<td align="char" char="plusmn">0.576 &#xb1; 0.003</td>
<td align="char" char="plusmn">0.576 &#xb1; 0.001</td>
<td align="char" char=".">19.101</td>
<td align="char" char=".">19.10</td>
</tr>
<tr>
<td align="left">20</td>
<td align="char" char="plusmn">0.581 &#xb1; 0.005</td>
<td align="char" char="plusmn">0.581 &#xb1; 0.002</td>
<td align="char" char=".">18.54</td>
<td align="char" char=".">18.40</td>
</tr>
<tr>
<td align="left">40</td>
<td align="char" char="plusmn">0.594 &#xb1; 0.008</td>
<td align="char" char="plusmn">0.596 &#xb1; 0.006</td>
<td align="char" char=".">16.57</td>
<td align="char" char=".">16.29</td>
</tr>
<tr>
<td align="left">100</td>
<td align="char" char="plusmn">0.642 &#xb1; 0.009</td>
<td align="char" char="plusmn">0.641 &#xb1; 0.011</td>
<td align="char" char=".">9.83</td>
<td align="char" char=".">9.97</td>
</tr>
<tr>
<td align="left">150</td>
<td align="char" char="plusmn">0.667 &#xb1; 0.012</td>
<td align="char" char="plusmn">0.671 &#xb1; 0.011</td>
<td align="char" char=".">6.32</td>
<td align="char" char=".">5.76</td>
</tr>
<tr>
<td align="left">200</td>
<td align="char" char="plusmn">0.684 &#xb1; 0.012</td>
<td align="char" char="plusmn">0.687 &#xb1; 0.012</td>
<td align="char" char=".">3.93</td>
<td align="char" char=".">3.51</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Accuracy of prediction using <italic>in-silico</italic> DNA pools of the reference population for different numbers of pools and sequencing coverage. The % decreased is the decrease in accuracy of prediction in % for the 40&#xd7; and 100&#xd7; compared to the individual genotype. The presented accuracies are the mean of 60 replicates and the standard errors are the standard deviation of 60 replicates.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The accuracy of prediction values for dataset-1&#x2013;DNA pooling of samples and individual SNP chips are shown in <xref ref-type="table" rid="T3">Table 3</xref>. The accuracy of prediction was 0.737 for the individual SNP chip data, and 0.716 and 0.700 for pooled data when sequencing coverage was 40&#xd7; and 20&#xd7;, respectively. This is up to &#x223c;5% higher accuracy for the individual SNP chip data.</p>
<table-wrap id="T3" position="float">
<label>TABLE 3</label>
<caption>
<p>Accuracy of prediction for dataset-1&#x2013;DNA pooling of samples.</p>
</caption>
<table>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="left">No. of SNPs</th>
<th align="left">Accuracy of prediction</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">SNP chip</td>
<td align="left">51646</td>
<td align="left">0.737 &#xb1; 0.006</td>
</tr>
<tr>
<td align="left">Sequence 40&#xd7;</td>
<td align="left">44538</td>
<td align="left">0.716 &#xb1; 0.004</td>
</tr>
<tr>
<td align="left">Sequence 20x</td>
<td align="left">45812</td>
<td align="left">0.700 &#xb1; 0.003</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<title>4 Discussion</title>
<p>Implementation of a conventional GS in aquaculture species is very expensive because of the very large number of selection candidates and test-sibs to be genotyped. In recent years, cost-efficient GS design approaches either to minimize the number of individuals or the number of markers to genotype, without significantly reducing the accuracy of selection, have been given emphasis (<xref ref-type="bibr" rid="B17">Lillehammer et al., 2013</xref>; <xref ref-type="bibr" rid="B2">Dagnachew and Meuwissen, 2019</xref>; <xref ref-type="bibr" rid="B29">Tsairidou et al., 2020</xref>). This study investigates the potential of DNA pooling for creating a reference population for genomic prediction of PD resistance (binary observation) in salmon. It demonstrated that SNP effects in a reference population can be estimated from SNP allele frequencies that are calculated from DNA pools and then the GEBVs for selection candidates can be economically computed with acceptable accuracies.</p>
<p>The datasets used in this study were generated as part of the SalmoBreed&#x2019;s practical breeding work to improve host resistance against PD. PD is currently one of the most economically important diseases in the Norwegian production of Atlantic salmon (<xref ref-type="bibr" rid="B11">Jansen et al., 2010</xref>). The disease is caused by SAV, of which at least three distinct genotypes have been identified (<xref ref-type="bibr" rid="B10">Hodneland et al., 2005</xref>; <xref ref-type="bibr" rid="B4">Fringuelli et al., 2008</xref>). Mortality rates vary widely from PD outbreaks; survivors may eventually die because of secondary infections and increased parasitism and suffer reduced growth and degraded product quality (<xref ref-type="bibr" rid="B14">Lerfall et al., 2012</xref>). Moderate heritabilities have been reported from field outbreak data (<xref ref-type="bibr" rid="B20">Norris et al., 2008</xref>) and through controlled challenge testing using different challenge models (<xref ref-type="bibr" rid="B7">Gonen et al., 2015</xref>). Considerable efforts have been made in mapping genes for PD resistance for use in marker-assisted selection and a QTL for PD resistance was mapped to Atlantic salmon chromosome 3, using both fry and smolt challenge test data from two populations (<xref ref-type="bibr" rid="B7">Gonen et al., 2015</xref>; <xref ref-type="bibr" rid="B9">Hillestad et al., 2020</xref>). The difference in SNP frequencies between survivors and dead in-silico DNA pools (<xref ref-type="fig" rid="F3">Figure 3</xref>) shows that there are larger differences in frequencies between dead and survivor pools for the SNPs located at chromosome 3. This validates that SNPs in that region are associated with some biological mechanisms which are likely to be influencing resistance to PD.</p>
<p>A sequence of DNA pools from individuals gives estimates of allele frequencies at SNPs with small or no loss in accuracy for a considerably lower cost compared to individual genotyping. Estimation of marker effects relies heavily on the accuracy of allele frequencies calculated from DNA pools. The accuracy of allele frequency estimation from pooled DNA samples depends on some experimental design parameters (<xref ref-type="bibr" rid="B5">Gautier et al., 2013</xref>), such as the number of individuals merged in a pool, the sequencing coverage, and the possibility of unequal contribution of each individual genome to the final sequencing read. The effects of the number of individuals merged in a pool were studied by varying the number of individuals in the pools (<xref ref-type="table" rid="T1">Table 1</xref>). However, the effect of sequencing coverage was studied by changing the number of sampling times (i.e., average sequencing coverage) for the in-silico pools and varying the sequencing coverage for pooled DNA samples. However, the effect of unequal contribution of individuals is not assessed in this study. It is important to note that accurate equimolar pooling of each genomic DNA is important for equal distribution of reads (<xref ref-type="bibr" rid="B12">Konczal et al., 2014</xref>) and the number of pooled samples should be balanced for accurate allele frequency estimation (<xref ref-type="bibr" rid="B5">Gautier et al., 2013</xref>; <xref ref-type="bibr" rid="B12">Konczal et al., 2014</xref>) and consequently, the implication of unequal DNA contribution for genomic prediction accuracy should be investigated.</p>
<p>For allele frequency calculation and SNP effects estimation, the importance of sequencing coverage decreased as the number of DNA pools increased (<xref ref-type="fig" rid="F2">Figures 2</xref> and <xref ref-type="fig" rid="F4">4</xref>). Given a fixed number of samples, as the number of pools increased, the number of individuals per pool reduced (<xref ref-type="table" rid="T1">Table 1</xref>), and thus the effectiveness of high sequencing coverage has diminished. This observed pattern is in agreement with that of Rellstab et al. (<xref ref-type="bibr" rid="B23">Rellstab et al., 2013</xref>), who reported that higher sequencing coverages (&#x3e;50) have no significant effect on allele estimation accuracy and only very low coverages (below 20&#xd7;) would substantially reduce the precision. In the current study, an exception from the general trend was when 1 pool per phenotype group was used, where the advantage of sequencing coverage was visible for allele frequency accuracy (<xref ref-type="fig" rid="F2">Figure 2</xref>) but not for the correlation of SNP effects (<xref ref-type="fig" rid="F4">Figure 4</xref>). Furthermore, the SNP effect correlations were poor for both 40&#xd7; and 100&#xd7; coverage.</p>
<p>The accuracy of GS is expected to increase as the number of genotyped and phenotyped animals in the reference population increases for any trait, in particular, for lowly heritable traits (<xref ref-type="bibr" rid="B26">Solberg et al., 2008</xref>). Our results showed that SNP effects correlation, GEBV correlations, and prediction accuracy increased as the number of pools increased from 1 to 200 per phenotype group (<xref ref-type="fig" rid="F4">Figures 4</xref> and <xref ref-type="fig" rid="F5">5</xref>; <xref ref-type="table" rid="T2">Table 2</xref>). <xref ref-type="bibr" rid="B8">Henshall et al. (012)</xref> showed that a large number of smaller pools would estimate allele frequency more accurately than small numbers of large pools. As the number of pools increased, the accuracy of allele frequency estimation increased (<xref ref-type="fig" rid="F2">Figure 2</xref>) and the prediction accuracy also increased (<xref ref-type="table" rid="T2">Table 2</xref>). Moreover, for a larger number of pools, it is observed that there are very little or no differences in SNP frequency accuracies and SNP effects correlation among different sequencing coverages (<xref ref-type="fig" rid="F2">Figures 2</xref> and <xref ref-type="fig" rid="F4">4</xref>). Similar trends were reported (<xref ref-type="bibr" rid="B8">Henshall et al., 2012</xref>; <xref ref-type="bibr" rid="B1">Bell et al., 2017</xref>) when there are a large number of pools. Furthermore, the high correlations between GEBVs estimated from individual genotype and DNA pools, particularly for the large number of pools (<xref ref-type="fig" rid="F5">Figure 5</xref>), evidenced that there is limited to no reranking of individuals.</p>
<p>The accuracy of breeding values using pedigree information for dataset-2 (only using phenotypes of 914 individuals and their pedigree) was 0.48 (the result is not presented). This accuracy was significantly improved by the use of genomic information from the pools, which is in agreement with other reports (<xref ref-type="bibr" rid="B27">Sonesson et al., 2010</xref>; <xref ref-type="bibr" rid="B2">Dagnachew and Meuwissen, 2019</xref>; <xref ref-type="bibr" rid="B13">Kriaridou et al., 2020</xref>; <xref ref-type="bibr" rid="B29">Tsairidou et al., 2020</xref>), especially for the large number of pools. The loss of accuracy for the use of DNA pools compared with the individual genotypes was minimal for dataset-1 (0.74 vs. 0.71, <xref ref-type="table" rid="T3">Table 3</xref>). However, the prediction accuracy difference was substantial for dataset-2 (0.71 vs. 0.57, <xref ref-type="table" rid="T2">Table 2</xref>) for the same number of pools. One explanation is that for dataset-1, the prediction accuracies were obtained for the same individuals in the pools (i.e., reference and validation were the same individuals). On the other hand, for dataset-2, the validation individuals were different from the reference population. Furthermore, computer simulation of DNA pooling provides an approximation and might fail to capture some parameters.</p>
<p>Results from using DNA pooling for genomic prediction are lacking. Our results show a trade-off between the number of DNA pools and the loss of prediction accuracy. A reduction in prediction accuracy means a reduction in genetic gain. This has a cost implication that is complex to quantify as it is determined by the trait, the breeding goal, and other specifics of the breeding industry. Assuming $20 genotyping cost per individual and $300 cost of sequencing per sample for a 40&#xd7; sequencing depth, the annual genotyping cost for 5,000 individuals (1,000 candidates and 4,000 informant sibs) is $100,000. However, for the DNA pool scenario, where only 1,000 candidates are genotyped and the reference siblings are pooled, the cost of genotyping varies from $20,600 to $140,000 for a single pool per phenotype to 200 pools per phenotype, respectively. Increasing the number of pools also increases the cost of sequencing and hence an appropriate pooling strategy should strike an optimal balance between cost-effectiveness and accuracy. Furthermore, in the present study, we have presented the prediction of allele frequencies from the pools of DNA using sequencing of the pools. However, calculation of allele frequencies from pooled DNA does not necessarily require sequencing of the pools. It has been reported that allele frequencies from DNA pools can also be calculated by SNP genotyping of the pools using light intensities (<xref ref-type="bibr" rid="B24">Reverter et al., 2014</xref>); hence, the cost associated with sequencing of the pools can be avoided.</p>
<p>As it is presented in the study, DNA pooling of a reference population can serve as a cost-effective GS approach, but with a potential limitation in that the identity of individuals would be lost and therefore individual characteristics and environmental factors could not be adjusted in genomic modeling, which may result in a loss in accuracy and a biased estimate of a genetic effect. In the current study, pooling within phenotype groups was done randomly; however, <xref ref-type="bibr" rid="B8">Henshall et al. (2012)</xref> suggested that pooling strategies within contemporary groups and fitting contemporary group in the model would eliminate some of these limitations. For example, in the studied datasets, pooling within sex and full-sib families would address these limitations.</p>
</sec>
<sec id="s5">
<title>5 Conclusion</title>
<p>DNA pooling of a reference population can serve as a cost-effective GS approach, but with some potential limitations. Results showed that a large number of pools are required to achieve as good genomic prediction accuracies as individual genotypes; however, alternative effective pooling strategies should be exploited to reduce the number of pools without reducing the prediction power. Nevertheless, it is demonstrated that pooling of a reference population can be used as a tool to optimize between cost and accuracy of selection.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability" id="s6">
<title>Data availability statement</title>
<p>The datasets presented in this article are not readily available because it is co-owned by a third party. Request to access the datasets should be directed to <email>anna.sonesson@nofima.no</email> and <email>hooman.moghadam@bmkgenetics.com</email>.</p>
</sec>
<sec id="s7">
<title>Ethics statement</title>
<p>Ethical review and approval were not required for the animal study because the datasets used in this study were generated as part of Benchmark&#x27;s routine breeding work, which is supervised and approved by the Norwegian Food Safety Authority (Mattilsynet).</p>
</sec>
<sec id="s8">
<title>Author contributions</title>
<p>BD: Conceptualization, investigation, methodology, writing&#x2014;original draft, and writing&#x2014;review and editing. MA: investigation, methodology, and writing&#x2014;review and editing. BH: data curation and writing&#x2014;review and editing. TM: methodology and writing&#x2014;review and editing. AS: conceptualization, funding acquisition, investigation, methodology, and writing&#x2014;review and editing.</p>
</sec>
<sec id="s9">
<title>Funding</title>
<p>This work received funding from the European Union&#x2019;s Seventh Framework Programme (KBBE.2013.1.2-10) under grant agreement no. 613611 (FISHBOOST). The funding body played no role in the design of the study, collection, analysis, and interpretation of data, and in writing the manuscript.</p>
</sec>
<sec sec-type="COI-statement" id="s10">
<title>Conflict of interest</title>
<p>BH was employed by Benchmark Genetics.</p>
<p>The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s11">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bell</surname>
<given-names>A. M.</given-names>
</name>
<name>
<surname>Henshall</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Porto-Neto</surname>
<given-names>L. R.</given-names>
</name>
<name>
<surname>Dominik</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>McCulloch</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Kijas</surname>
<given-names>J.</given-names>
</name>
<etal/>
</person-group> (<year>2017</year>). <article-title>Estimating the genetic merit of sires by using pooled DNA from progeny of undetermined pedigree</article-title>. <source>Genet. Sel. Evol.</source> <volume>49</volume> (<issue>1</issue>), <fpage>28</fpage>. <pub-id pub-id-type="doi">10.1186/s12711-017-0303-8</pub-id> </citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dagnachew</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T.</given-names>
</name>
</person-group> (<year>2019</year>). <article-title>Accuracy of within-family multi-trait genomic selection models in a sib-based aquaculture breeding scheme</article-title>. <source>Aquaculture</source> <volume>505</volume>, <fpage>27</fpage>&#x2013;<lpage>33</lpage>. <pub-id pub-id-type="doi">10.1016/j.aquaculture.2019.02.036</pub-id> </citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Danecek</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bonfield</surname>
<given-names>J. K.</given-names>
</name>
<name>
<surname>Liddle</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ohan</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Pollard</surname>
<given-names>M. O.</given-names>
</name>
<etal/>
</person-group> (<year>2021</year>). <article-title>Twelve years of SAMtools and BCFtools</article-title>. <source>GigaScience</source> <volume>10</volume> (<issue>2</issue>), <fpage>giab008</fpage>. <pub-id pub-id-type="doi">10.1093/gigascience/giab008</pub-id> </citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fringuelli</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Rowley</surname>
<given-names>H. M.</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>J. C.</given-names>
</name>
<name>
<surname>Hunter</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Rodger</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Graham</surname>
<given-names>D. A.</given-names>
</name>
<etal/>
</person-group> (<year>2008</year>). <article-title>Phylogenetic analyses and molecular epidemiology of European salmonid alphaviruses (SAV) based on partial E2 and nsP3 gene nucleotide sequences</article-title>. <source>J. Fish. Dis.</source> <volume>31</volume> (<issue>11</issue>), <fpage>811</fpage>&#x2013;<lpage>823</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2761.2008.00944.x</pub-id> </citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gautier</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Foucaud</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gharbi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Cezard</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Galan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Loiseau</surname>
<given-names>A.</given-names>
</name>
<etal/>
</person-group> (<year>2013</year>). <article-title>Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping</article-title>. <source>Mol. Ecol.</source> <volume>22</volume> (<issue>14</issue>), <fpage>3766</fpage>&#x2013;<lpage>3779</lpage>. <pub-id pub-id-type="doi">10.1111/mec.12360</pub-id> </citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goddard</surname>
<given-names>M. E.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>B. J.</given-names>
</name>
</person-group> (<year>2007</year>). <article-title>Genomic selection</article-title>. <source>J. Anim. Breed. Genet.</source> <volume>124</volume> (<issue>6</issue>), <fpage>323</fpage>&#x2013;<lpage>330</lpage>. <pub-id pub-id-type="doi">10.1111/j.1439-0388.2007.00702.x</pub-id> </citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gonen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Baranski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Thorland</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Norris</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Grove</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Arnesen</surname>
<given-names>P.</given-names>
</name>
<etal/>
</person-group> (<year>2015</year>). <article-title>Mapping and validation of a major QTL affecting resistance to pancreas disease (Salmonid alphavirus) in Atlantic salmon (<italic>Salmo salar</italic>)</article-title>. <source>Heredity</source> <volume>115</volume> (<issue>5</issue>), <fpage>405</fpage>&#x2013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1038/hdy.2015.37</pub-id> </citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Henshall</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>Hawken</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Dominik</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Barendse</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2012</year>). <article-title>Estimating the effect of SNP genotype on quantitative traits from pooled DNA samples</article-title>. <source>Genet. Sel. Evol.</source> <volume>44</volume>, <fpage>12</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9686-44-12</pub-id> </citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hillestad</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Makvandi-Nejad</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Krasnov</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Moghadam</surname>
<given-names>H. K.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Identification of genetic loci associated with higher resistance to pancreas disease (PD) in Atlantic salmon (<italic>Salmo salar</italic> L.)</article-title>. <source>BMC genomics</source> <volume>21</volume> (<issue>1</issue>), <fpage>388</fpage>. <pub-id pub-id-type="doi">10.1186/s12864-020-06788-4</pub-id> </citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hodneland</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Bratland</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Christie</surname>
<given-names>K. E.</given-names>
</name>
<name>
<surname>Endresen</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Nylund</surname>
<given-names>A.</given-names>
</name>
</person-group> (<year>2005</year>). <article-title>New subtype of salmonid alphavirus (SAV), Togaviridae, from Atlantic salmon <italic>Salmo salar</italic> and rainbow trout <italic>Oncorhynchus mykiss</italic> in Norway</article-title>. <source>Dis. Aquat. Organ.</source> <volume>66</volume> (<issue>2</issue>), <fpage>113</fpage>&#x2013;<lpage>120</lpage>. <pub-id pub-id-type="doi">10.3354/dao066113</pub-id> </citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jansen</surname>
<given-names>M. D.</given-names>
</name>
<name>
<surname>Taksdal</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Wasmuth</surname>
<given-names>M. A.</given-names>
</name>
<name>
<surname>Gjerset</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Brun</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Olsen</surname>
<given-names>A. B.</given-names>
</name>
<etal/>
</person-group> (<year>2010</year>). <article-title>Salmonid alphavirus (SAV) and pancreas disease (PD) in Atlantic salmon, <italic>Salmo salar</italic> L., in freshwater and seawater sites in Norway from 2006 to 2008</article-title>. <source>J. Fish. Dis.</source> <volume>33</volume> (<issue>5</issue>), <fpage>391</fpage>&#x2013;<lpage>402</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2761.2009.01131.x</pub-id> </citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Konczal</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Koteja</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Stuglik</surname>
<given-names>M. T.</given-names>
</name>
<name>
<surname>Radwan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Babik</surname>
<given-names>W.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Accuracy of allele frequency estimation using pooled RNA-Seq</article-title>. <source>Mol. Ecol. Resour.</source> <volume>14</volume> (<issue>2</issue>), <fpage>381</fpage>&#x2013;<lpage>392</lpage>. <pub-id pub-id-type="doi">10.1111/1755-0998.12186</pub-id> </citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kriaridou</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tsairidou</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Houston</surname>
<given-names>R. D.</given-names>
</name>
<name>
<surname>Robledo</surname>
<given-names>D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Genomic prediction using low density marker panels in aquaculture: performance across species, traits, and genotyping platforms</article-title>. <source>Front. Genet.</source> <volume>11</volume> (<issue>124</issue>). <pub-id pub-id-type="doi">10.3389/fgene.2020.00124</pub-id> </citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lerfall</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Larsson</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Birkeland</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Taksdal</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Dalgaard</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Afanasyev</surname>
<given-names>S.</given-names>
</name>
<etal/>
</person-group> (<year>2012</year>). <article-title>Effect of pancreas disease (PD) on quality attributes of raw and smoked fillets of Atlantic salmon (<italic>Salmo salar</italic> L.)</article-title>. <source>Aquaculture</source> <volume>324-325</volume>, <fpage>209</fpage>&#x2013;<lpage>217</lpage>. <pub-id pub-id-type="doi">10.1016/j.aquaculture.2011.11.003</pub-id> </citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM</article-title>. <comment>arXiv arXiv:1303.3997</comment>. </citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Homer</surname>
<given-names>N.</given-names>
</name>
<etal/>
</person-group> (<year>2009</year>). <article-title>The sequence alignment/map format and SAMtools</article-title>. <source>Bioinformatics</source> <volume>25</volume> (<issue>16</issue>), <fpage>2078</fpage>&#x2013;<lpage>2079</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id> </citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lillehammer</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H. E.</given-names>
</name>
<name>
<surname>Sonesson</surname>
<given-names>A. K.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>A low-marker density implementation of genomic selection in aquaculture using within-family genomic breeding values</article-title>. <source>Genet. Sel. Evol.</source> <volume>45</volume> (<issue>1</issue>), <fpage>39</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9686-45-39</pub-id> </citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meuwissen</surname>
<given-names>T. H.</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>B. J.</given-names>
</name>
<name>
<surname>Goddard</surname>
<given-names>M. E.</given-names>
</name>
</person-group> (<year>2001</year>). <article-title>Prediction of total genetic value using genome-wide dense marker maps</article-title>. <source>Genetics</source> <volume>157</volume> (<issue>4</issue>), <fpage>1819</fpage>&#x2013;<lpage>1829</lpage>. <pub-id pub-id-type="doi">10.1093/genetics/157.4.1819</pub-id> </citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nielsen</surname>
<given-names>H. M.</given-names>
</name>
<name>
<surname>Sonesson</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Yazdi</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H. E.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Comparison of accuracy of genome-wide and BLUP breeding value estimates in sib based aquaculture breeding schemes</article-title>. <source>Aquaculture</source> <volume>289</volume> (<issue>3&#x2013;4</issue>), <fpage>259</fpage>&#x2013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1016/j.aquaculture.2009.01.027</pub-id> </citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Norris</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Foyle</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ratcliff</surname>
<given-names>J.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Heritability of mortality in response to a natural pancreas disease (SPDV) challenge in Atlantic salmon, <italic>Salmo salar</italic> L., post-smolts on a West of Ireland sea site</article-title>. <source>J. Fish. Dis.</source> <volume>31</volume> (<issue>12</issue>), <fpage>913</fpage>&#x2013;<lpage>920</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-2761.2008.00982.x</pub-id> </citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>&#xd8;deg&#xe5;rd</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Indahl</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Strand&#xe9;n</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H. E.</given-names>
</name>
</person-group> (<year>2018</year>). <article-title>Large-scale genomic prediction using singular value decomposition of the genotype matrix</article-title>. <source>Genet. Sel. Evol.</source> <volume>50</volume> (<issue>1</issue>), <fpage>6</fpage>. <pub-id pub-id-type="doi">10.1186/s12711-018-0373-2</pub-id> </citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>&#xd8;deg&#xe5;rd</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Identity-by-descent genomic selection using selective and sparse genotyping</article-title>. <source>Genet. Sel. Evol.</source> <volume>46</volume> (<issue>1</issue>), <fpage>3</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9686-46-3</pub-id> </citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rellstab</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Zoller</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Tedder</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Gugerli</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Fischer</surname>
<given-names>M. C.</given-names>
</name>
</person-group> (<year>2013</year>). <article-title>Validation of SNP allele frequencies determined by pooled next-generation sequencing in natural populations of a non-model plant species</article-title>. <source>PLoS One</source> <volume>8</volume> (<issue>11</issue>), <fpage>e80422</fpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0080422</pub-id> </citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reverter</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Henshall</surname>
<given-names>J. M.</given-names>
</name>
<name>
<surname>McCulloch</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sasazaki</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hawken</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Lehnert</surname>
<given-names>S. A.</given-names>
</name>
</person-group> (<year>2014</year>). <article-title>Numerical analysis of intensity signals resulting from genotyping pooled DNA samples in beef cattle and broiler chicken</article-title>. <source>J. Anim. Sci.</source> <volume>92</volume> (<issue>5</issue>), <fpage>1874</fpage>&#x2013;<lpage>1885</lpage>. <pub-id pub-id-type="doi">10.2527/jas.2013-7133</pub-id> </citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sham</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Bader</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Craig</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>O&#x27;Donovan</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Owen</surname>
<given-names>M.</given-names>
</name>
</person-group> (<year>2002</year>). <article-title>DNA pooling: a tool for large-scale association studies</article-title>. <source>Nat. Rev. Genet.</source> <volume>3</volume>, <fpage>862</fpage>&#x2013;<lpage>871</lpage>. <pub-id pub-id-type="doi">10.1038/nrg930</pub-id> </citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Solberg</surname>
<given-names>T. R.</given-names>
</name>
<name>
<surname>Sonesson</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Woolliams</surname>
<given-names>J. A.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H. E.</given-names>
</name>
</person-group> (<year>2008</year>). <article-title>Genomic selection using different marker types and densities</article-title>. <source>J. Anim. Sci.</source> <volume>86</volume> (<issue>10</issue>), <fpage>2447</fpage>&#x2013;<lpage>2454</lpage>. <pub-id pub-id-type="doi">10.2527/jas.2007-0010</pub-id> </citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sonesson</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H.</given-names>
</name>
<name>
<surname>Goddard</surname>
<given-names>M. E.</given-names>
</name>
</person-group> (<year>2010</year>). <article-title>The use of communal rearing of families and DNA pooling in aquaculture genomic selection schemes</article-title>. <source>Genet. Sel. Evol.</source> <volume>42</volume>, <fpage>41</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9686-42-41</pub-id> </citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sonesson</surname>
<given-names>A. K.</given-names>
</name>
<name>
<surname>Meuwissen</surname>
<given-names>T. H.</given-names>
</name>
</person-group> (<year>2009</year>). <article-title>Testing strategies for genomic selection in aquaculture breeding programs</article-title>. <source>Genet. Sel. Evol.</source> <volume>41</volume> (<issue>1</issue>), <fpage>37</fpage>. <pub-id pub-id-type="doi">10.1186/1297-9686-41-37</pub-id> </citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tsairidou</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Hamilton</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Robledo</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Bron</surname>
<given-names>J. E.</given-names>
</name>
<name>
<surname>Houston</surname>
<given-names>R. D.</given-names>
</name>
</person-group> (<year>2020</year>). <article-title>Optimizing low-cost genotyping and imputation strategies for genomic selection in atlantic salmon</article-title>. <source>G3</source> <volume>10</volume> (<issue>2</issue>), <fpage>581</fpage>&#x2013;<lpage>590</lpage>. <pub-id pub-id-type="doi">10.1534/g3.119.400800</pub-id> </citation>
</ref>
</ref-list>
</back>
</article>