Genomic Signatures After Five Generations of Intensive Selective Breeding: Runs of Homozygosity and Genetic Diversity in Representative Domestic and Wild Populations of Turbot (Scophthalmus maximus).

Massive genotyping of single nucleotide polymorphisms (SNP) has opened opportunities for analyzing the way in which selection shapes genomes. Artificial or natural selection usually leaves genomic signatures associated with selective sweeps around the responsible locus. Strong selective sweeps are most often identified either by lower genetic diversity than the genomic average and/or islands of runs of homozygosity (ROHi). Here, we conducted an analysis of selective sweeps in turbot (Scophthalmus maximus) using two SNP datasets from a Northeastern Atlantic population (36 individuals) and a domestic broodstock (46 individuals). Twenty-six families (∼ 40 offspring per family) from this broodstock and three SNP datasets applying differing filtering criteria were used to adjust ROH calling parameters. The best-fitted genomic inbreeding estimate (FROH) was obtained by the sum of ROH longer than 1 Mb, called using a 21,615 SNP panel, a sliding window of 37 SNPs and one heterozygous SNP per window allowed. These parameters were used to obtain the ROHi distribution in the domestic and wild populations (49 and 0 ROHi, respectively). Regions with higher and lower genetic diversity within each population were obtained using sliding windows of 37 SNPs. Furthermore, those regions were mapped in the turbot genome against previously reported genetic markers associated with QTL (Quantitative Trait Loci) and outlier loci for domestic or natural selection to identify putative selective sweeps. Out of the 319 and 278 windows surpassing the suggestive pooled heterozygosity thresholds (ZHp) in the wild and domestic population, respectively, 78 and 54 were retained under more restrictive ZHp criteria. A total of 116 suggestive windows (representing 19 genomic regions) were linked to either QTL for production traits, or outliers for divergent or balancing selection. Twenty-four of them (representing 3 genomic regions) were retained under stricter ZHp thresholds. Eleven QTL/outlier markers were exclusively found in suggestive regions of the domestic broodstock, 7 in the wild population and one in both populations; one (broodstock) and two (wild) of those were found in significant regions retained under more restrictive ZHp criteria in the broodstock and the wild population, respectively. Genome mining and functional enrichment within regions associated with selective sweeps disclosed relevant genes and pathways related to aquaculture target traits, including growth and immune-related pathways, metabolism and response to hypoxia, which showcases how this genome atlas of genetic diversity can be a valuable resource to look for candidate genes related to natural or artificial selection in turbot populations.

Massive genotyping of single nucleotide polymorphisms (SNP) has opened opportunities for analyzing the way in which selection shapes genomes. Artificial or natural selection usually leaves genomic signatures associated with selective sweeps around the responsible locus. Strong selective sweeps are most often identified either by lower genetic diversity than the genomic average and/or islands of runs of homozygosity (ROHi). Here, we conducted an analysis of selective sweeps in turbot (Scophthalmus maximus) using two SNP datasets from a Northeastern Atlantic population (36 individuals) and a domestic broodstock (46 individuals). Twenty-six families (∼ 40 offspring per family) from this broodstock and three SNP datasets applying differing filtering criteria were used to adjust ROH calling parameters. The best-fitted genomic inbreeding estimate (F ROH ) was obtained by the sum of ROH longer than 1 Mb, called using a 21,615 SNP panel, a sliding window of 37 SNPs and one heterozygous SNP per window allowed. These parameters were used to obtain the ROHi distribution in the domestic and wild populations (49 and 0 ROHi, respectively). Regions with higher and lower genetic diversity within each population were obtained using sliding windows of 37 SNPs. Furthermore, those regions were mapped in the turbot genome against previously reported genetic markers associated with QTL (Quantitative Trait Loci) and outlier loci for domestic or natural selection to identify putative selective sweeps. Out of the 319 and 278 windows surpassing the suggestive pooled heterozygosity thresholds (ZHp) in the wild and domestic population, respectively, 78 and 54 were retained under more restrictive ZHp criteria. A total of 116 suggestive windows (representing 19 genomic regions) were linked to either QTL for production traits, or outliers for divergent or balancing selection. Twenty-four of them (representing 3 genomic regions) were retained under stricter ZHp thresholds. Eleven QTL/outlier markers were exclusively found in suggestive regions of the domestic broodstock, 7 in the wild population

INTRODUCTION
Artificial selection has been accomplished by humans to domesticate species with desirable properties, in order to reach more profitable phenotypes. Despite its differences with natural selection, both fit to the same general principle: selective pressure is indirectly applied on specific genomic regions where genes controlling relevant traits are found, thus modifying the allelic frequencies in the target population (Brito et al., 2017). The main difference between them is that domestication involves the relaxation of the selective pressure applied on fitness traits relevant for survival in wild populations, while the pressure on traits relevant for production is intensified (Sun et al., 2014).
A strong selective pressure leads to an increase in the frequencies of favorable allelic variants and, consequently, to a decrease of genetic variation in the gene/s under selection. Moreover, due to the physical association of that locus with nearby loci (genetic linkage), genetic variants of loci in the vicinity are also suffering a decrease of genetic variability through hitch-hiking effects, thus being deviated from the expected proportions under neutrality (Sun et al., 2014). Accordingly, a pattern of linkage disequilibrium (LD) will emerge around the locus targeted by selection. This train of "dragging" events is known as a selective sweep and leaves behind detectable signatures at the genome level (Smith and Haigh, 1974;Qanbari et al., 2012).
In most studies, only strong selective sweeps (where only one allele is targeted by selection) have been described (Johansson et al., 2010;Rubin et al., 2010Rubin et al., , 2012Ramey et al., 2013). Soft selective sweeps (where more than one allele is targeted by selection) are much harder to detect due to their resistance to heterozygosity decrease (Sun et al., 2014), so they are underrepresented in scientific studies (Messer and Petrov, 2013). Since strong selective sweeps determine the increase in the frequency of a specific allele (in some cases leading to complete fixation), its detection is much easier than in soft selective sweeps. Therefore, they are useful for the detection of candidate genes responsible for important traits related to domestication and selection (Smith and Haigh, 1974;Sun et al., 2014). However, care must be taken since candidate regions subject to strong selective sweeps might also represent false positives due to genetic drift (through effects of effective size, founder effect, bottlenecks) or inbreeding (Purfield et al., 2017).
There are several methods to identify selective sweeps through a genomic screening using large numbers of markers, such as SNP (Single Nucleotide Polymorphism) loci. Two main approaches have been particularly employed to identify signatures of selection (Bonhomme et al., 2015). One of them explores the distribution of genetic diversity across the genome using indexes such as pooled heterozygosity, which has been applied in different livestock species (Rubin et al., 2010(Rubin et al., , 2012Qanbari et al., 2012;Sun et al., 2014;Zwane et al., 2019). On the other hand, the analysis of ROH (Runs of Homozygosity), among its many applications, has frequently been used to study genomic signatures of selection (Metzger et al., 2015). A strong selective pressure will reduce genetic diversity on the targeted locus and neighbor loci. The newly formed haplotypes could give rise to ROHs among the individuals of the population experiencing the selective pressure. ROH analysis represents an informative methodology to address demographic studies (Kirin et al., 2010;Ceballos et al., 2018b) and to map recessive mutations in complex diseases (Nalls et al., 2009). In some domestic animals, it has been used to estimate genetic diversity across the genome (Metzger et al., 2015).
Furthermore, genome-wide ROH analysis is a useful tool to identify genomic signatures of inbreeding. Inbreeding reduces genetic diversity within individuals, leading to the appearance of long ROH genome stretches. The longer the ROH, the more recently the consanguineous event has occurred, since recombination would have had fewer opportunities to break up those ROH (Purfield et al., 2017). ROH analysis has provided more precise inbreeding estimates than methods based on pedigrees (Ferenčaković et al., 2017). However, care must be taken since not all ROHs can be attributable to Identity by Descent (IBD). Short ROH segments can occur by chance (Identity by State; IBS) (Marras et al., 2014) due to the uneven recombination frequency across the genome. This fact determines the existence of common haplotypes among nonrelated ancestors as a consequence of genetic drift or the evolutionary history of the population under study (Purfield et al., 2017). Genome-wide ROH analysis has also facilitated the study of inbreeding in natural populations in evolutionary and conservation studies (Kardos et al., 2016).
Selective sweeps have been detected in various domestic species such as goat (Brito et al., 2017), cattle (Gutiérrez-Gil et al., 2015), sheep (Moradi et al., 2012), chicken (Rubin et al., 2010;Qanbari et al., 2012), pig (Rubin et al., 2012), and dog (Quilez et al., 2011). However, to our knowledge, studies of selective sweeps related to fish domestication and its impact are scarce, since most breeding programs are relatively new (half a century). Recently, the availability of increasing genomic resources and tools has allowed the tracking of selection events in some fish species such as stickleback (Cano et al., 2006;Jones et al., 2012), Atlantic salmon (Vasemägi et al., 2012;Barson et al., 2015) and catfish (Sun et al., 2014).
Turbot breeding programs started in the 1990s and are currently in the 5th generation of selection (Martínez et al., 2016). Domestic turbot is of Atlantic origin in the three European companies with selective breeding programs. Wild Atlantic turbot populations show genetic homogeneity and are considered as a nearly panmictic unit (Vilas et al., 2015;do Prado et al., 2018b). The turbot genome has recently been assembled (Figueras et al., 2016) and anchored to a high-density genetic map (Maroso et al., 2018). Linkage maps (Bouza et al., 2007(Bouza et al., , 2012Martínez et al., 2008;Hermida et al., 2013) have been used in this species to localize QTL related to the main traits for selective breeding programs, namely sex determination (Martínez et al., 2009), growth (Sánchez-Molano et al., 2011Rodríguez-Ramilo et al., 2014) and resistance to various pathogens (Rodríguez-Ramilo et al., 2011, as a step toward marker assisted selection (Sciara et al., 2018). Furthermore, studies have also provided outlier markers and candidate genomic regions related to adaptive variation in wild (do Prado et al., 2018b) and domestic turbot (do Prado et al., 2018a). Recent advances in genotyping by sequencing and whole-genome resequencing have provided a large number of SNP loci covering the 567-Mb turbot genome (Maroso et al., 2018). These resources bring up new opportunities for genome scanning in order to study genetic diversity and selection signatures in turbot at even higher marker density than that reported in other mammalian livestock (Brito et al., 2017).
The objectives of this study are: (i) to conduct a genomewide characterization of ROH and genetic diversity in wild and domestic turbot using a large number of SNP loci; obtaining firstly meaningful computational conditions to accurately call ROH and estimate the genomic inbreeding coefficient (F ROH ), and secondly the ROH distribution and F ROH in the wild and domestic turbot samples; (ii) to co-localize previously reported QTL and outlier markers associated with productive traits and adaptive variability with genomic regions of high and low genetic diversity suggestive of selection in wild and domestic turbot; and (iii) to explore gene functions and pathways associated with these regions through preliminary genome mining and functional annotation analyses.

Ethics Statement
The animal experimental procedures carried out in this study were approved and conducted in strict accordance with the Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposes.

Domestic Individuals
When the study was performed, the four active European hatcheries were founded with individuals from the Atlantic, where populations are genetically rather homogeneous (F ST = 0.0024;do Prado et al., 2018b). The broodstock of CETGA (Aquaculture Cluster of Galicia, Ribeira, Spain), constituted by a mix of these four hatcheries, provided a good representation of domestic turbot (do Prado et al., 2018a), and it was used to analyze both the ROHi (Runs of Homozygosity islands) and the distribution of genetic diversity in our study. Despite different breeding strategies, the same traits have been targeted in the turbot breeding programs, i.e., growth-related traits and resistance to major infectious diseases affecting turbot production (Sánchez-Molano et al., 2011;Rodríguez-Ramilo et al., 2014), suggesting that processes of convergent selection may have shaped the genome of the different turbot broodstock. We analyzed a random sample of the CETGA broodstock consisting of 46 breeders, which were additionally used for the foundation of 44 families (average of 39.5 individuals and range of 36-45 individuals per family) in the framework of the FISHBOOST project (EU 613611; Anacleto et al., 2019;Saura et al., 2019). Twenty-six of these 44 families were used in this study in order to determine the best parameters for ROH calling.
The raw genotyping data for our study was taken from Maroso et al. (2018), which followed a 2b-RAD protocol for SNP genotyping. We considered three different SNP panels for genotyping the CETGA sample and the families analyzed. These panels derived from the same bioinformatic pipeline but used more relaxed or stricter quality control (QC) procedures (Maroso et al., 2018). The panels consisted of 25,511, 21,615 (Supplementary Table 1) and 18,198 SNP, at an approximate density of 1 SNP per 22, 26, and 31 kb, respectively, across the turbot genome (567 Mb; Figueras et al., 2016;Maroso et al., 2018). The performance of these three panels was compared in order to select the most adequate SNP panel for ROH calling.

Wild Individuals
The wild population consisted of 36 individuals from the Major Fishing Area 27-4b (Atlantic NE, North Sea). Previous reports suggested that this is a panmictic population not introgressed from hatchery turbot releases that were carried out in the past in the North Sea (do Prado et al., 2018a). Environmental factors and neutral forces shaping the genome of this population have been reported by do Prado et al. (2018b).
DNA was extracted using the DNeasy blood tissue kit following provider protocol (Qiagen), while library preparation was carried out following a modified version of the ddRAD paired-end protocol from Poland and Rife (2012). Pst1 and Msp1 restriction enzymes were used for targeting RAD-tags followed by an additional agarose gel selection between 350 and 500 bp. After a PCR amplification phase (12 cycles), the library was purified with AMPure beads. The quality was controlled on a bioanalyzer using the high sensitivity DNA reagent (Agilent Technologies). The final library was then sequenced on one lane of Illumina HiSeq4000 at the BGI sequencing platform.
The pipeline for SNP calling, filtering and genotyping developed for domestic turbot by Maroso et al. (2018) was adapted for the wild population data as follows. Raw reads were head-trimmed to 85 bp using Trimmomatic (Bolger et al., 2014), except for the forward sequences on the second library: in order to address differences in library preparation, those reads needed to be head-trimmed first to 87 bp and later tail-trimmed to 85 bp using Stacks' process_radtags pipeline (Catchen et al., 2013). Process_radtags was also used to filter raw reads according to sequence quality, using a 9 bp sliding window and allowing a minimum score of 20 (-w 0.1 -s 20).
Filtered reads were mapped against the latest version of the turbot reference genome (ASM318616v1; Figueras et al., 2016). Bowtie1.1.2 (Langmead et al., 2009) was used despite the existence of a more advanced version (Bowtie2; Langmead and Salzberg, 2012) since control measures are more straightforward in the former version, and the only drawback (processing speed) was negligible for our dataset. The following parameters were set: (i) reads were only considered when aligned to a single site in the reference genome (-m 1); (ii) the mismatch-threshold allowed up to 3 for 85 bp fragments (-v 3); (iii) overlapping was avoided by setting the minimum insert size to 170 bp (-I 170, 85 bp + 85 bp); and (iv) the maximum insert size was set to 500 bp (-X 500).
Mapped reads were processed with the Stacks2.2 gstacks module (Catchen et al., 2013). Default parameters were set, except for a stricter alpha threshold for genotype-calling (-gtalpha 0.01) and without soft-clipping of the 5 and 3 ends (-max-clipped 0). Stacks' populations module (Catchen et al., 2013) was used for further filtering and parameters were set so that only SNPs genotyped in > 56% of the individuals (-r 0.56; 20 individuals) were considered; no additional MAF filters (-min_maf 0) were added.
Using VCFtools (Danecek et al., 2011) we considered only those genotypes with a minimum read depth of 8 reads (-minDP 8). Due to the low read representation in some loci after this filtering step, the 20-individuals-filter was applied again, as well as a filter for monomorphic SNPs. Finally, in those cases where a RAD-tag contained more than one SNP, only the closest one to the 5 end was retained, according to the filtering pipeline used in the domestic broodstock (Maroso et al., 2018).

Obtaining Meaningful PLINK Conditions
The observational approach implemented by PLINK v1.9 was used to call ROHs. The simplicity of this approach allows efficient execution on data from different array platforms or sequencing technologies (Howrigan et al., 2011;Joshi et al., 2015;Ceballos et al., 2018a,b). Tests on simulated and real data showed that the approach used by PLINK outperformed its competitors in reliably detecting ROH.
Different PLINK conditions were used to obtain ROH from the three QC datasets. In order to search for the most meaningful conditions for accurate ROH calling, the minimum number of SNPs that a ROH is required to have (-homozyg-snp), the required minimum density to consider a ROH (-homozygdensity) and the number of SNPs that the sliding window must have (-homozyg-window-snp), ROHs were set up to change from 24 to 40 SNPs for each iteration. Moreover, the number of heterozygous SNPs allowed in a window (-homozyg-windowhet) was set up to 0 (no heterozygous SNP allowed) and 1 (one heterozygous SNP allowed to account for genotyping errors) for each SNP-per-window iteration. The other parameters were constant for all iterations (-homozyg-kb 200; -homozyggap 1000; -homozyg-window-missing 5; -homozyg-windowsthreshold 0.05).
Obtaining the Genomic Inbreeding Coefficient (F ROH ) F ROH was calculated for every PLINK condition tested as the total sum of ROH divided by the total length of the genome, under the lack of sex chromosome heteromorphism in turbot (Taboada et al., 2014), as follows: len ROH len genome Different ROH length cut-offs were used to obtain F ROH : 1, 2, 4, 8 and 10 Mb. In order to find the best cut-off and PLINK conditions, parental offspring F ROH coefficients were regressed against the known F IBD .

Genomic Distribution of ROH
Genomic regions where ROH are particularly prevalent (ROH islands: ROHi) were obtained for each family and for both populations analyzed, the broodstock and the wild ones. In order to search for ROH islands a sliding window of 50 kb was used. In every 50 kb genomic window the number of individuals in ROH was obtained. To know if a specific genomic window had a significant enrichment of ROH across the population a binomial test with P < 2 × 10 −5 with Bonferroni correction for 640 windows was applied.

Genetic Diversity and Selective Sweep Analysis
The selective sweep screening was performed using the SNP datasets for both the broodstock (21,615 SNP panel, see section "Results and Discussion") and the wild populations. Both datasets were analyzed using a 37 SNP sliding window, previously set up as the best fitted for ROH-calling (see Results and Discussion), and 1 SNP sliding at a time. For each window, pooled heterozygosity (Hp) (Rubin et al., 2010(Rubin et al., , 2012, along with the ZHp score, were computed to estimate the patterns of genetic diversity across the turbot genome as follows; Windows with ZHp < −3 or ZHp > 3 were retained as strict regions with significant lower-or higher-than-average genetic diversity (Low-GD; High-GD), respectively. Given that identification of extreme diversity regions of heterogeneous physical length may be too restrictive (Stainton et al., 2016), less strict ZHp thresholds (≤ −2.5 and ≥ 2.5) were also evaluated to identify suggestive regions, considering the unequal genome distribution of the SNP panels. The retained windows were contrasted to the genome positions of QTL markers for the main productive traits (growth, sex determination, resistance to Aeromonas salmonicida, Philasterides dicentrarchi and VHSV) and outlier loci related to adaptive variation previously reported in domestic and wild turbot (Martínez et al., 2009;Rodríguez-Ramilo et al., 2011Sánchez-Molano et al., 2011;do Prado et al., 2018a,b;Sciara et al., 2018), in order to identity putative selective sweeps as considered in other livestock species (Stainton et al., 2016;Zwane et al., 2019).

Gene Clustering and Functional Analysis
To identify the underlying biological functions of the candidate selective sweeps, the boundaries of selected candidate regions were used to retrieve gene lists from Ensembl-BioMart using the updated version of the turbot genome assembly (Maroso et al., 2018). The in silico analysis was conducted using the Functional Annotation Clustering tool implemented in DAVID (Huang da et al., 2009a,b) considering GO-term (Molecular Function 3 and 4; Biological Process 3 and 4), KEGG-pathway and UP_KEYWORDS annotation categories, as well as functional enrichment analyses (P < 0.05). The analysis was performed on overlapping regions between ROH islands (ROHi) and/or lowhigh genetic diversity regions taking as reference functional QTL markers and/or outlier loci.

SNP Genotyping Data
In this study, broad panels of SNPs derived from genotyping by sequencing protocols were genotyped (Supplementary Tables 3,   4), using the most recent turbot genome assembly (Maroso et al., 2018), and were applied to identify selective sweeps in wild and domestic turbot populations. This was specifically addressed by analyzing the distribution of ROH (runs of homozygosity) and by classifying low-versus high-GD (genetic diversity) regions across the turbot genome, an approach that has been frequently carried out in terrestrial livestock species, but scarcely to date in fish (Cano et al., 2006;Vasemägi et al., 2012;Sun et al., 2014). Despite using SNP panels coming from two different genotyping by sequencing techniques, the final marker density was very similar for both the domestic and wild populations, and appropriate for the purpose of the study. Furthermore, an effort was done to homogenize genotyping in both populations by accommodating the filtering and genotyping pipeline to both datasets. A total of 21,615 SNP loci (Supplementary Table 1) were considered for the analysis of the domestic population after checking the performance of different stringency filters to accommodate the data to meaningful PLINK conditions (see below). Regarding the wild population, a total of 126,159,372 raw paired reads were produced, thus representing 3,504,427 raw paired reads on average for the 36 individuals (range: 1,021,421-7,790,680). Approximately 90.4% of these reads were retained after quality filtering through Stacks' process_radtags pipeline. Reference mapping was conducted through the alignment of sequence reads with the current turbot genome assembly (Maroso et al., 2018)  After Stacks' populations filter, 45,906 RAD-tags were retained for analysis in the wild population. Using VCF tools the coverage and 20-individuals filters were applied rendering a total of 28,790 RAD-tags (69,504 SNPs). However, only a total of 28,790 SNP loci were considered, since only one SNP was retained per RAD-tag. After deleting monomorphic SNPs (generated after the coverage filter), a final set of 25,681 SNP loci was obtained (Supplementary Table 2), which represents an average density of 1 SNP per 22 kb in the turbot genome (567 Mb).
The SNP densities handled for the small 567 Mb genome of turbot (Figueras et al., 2016;Maroso et al., 2018) were within the range in similar studies in other fish and vertebrates, representing even higher resolution than that reported in the larger mammal genomes (Marras et al., 2014;Brito et al., 2017).

Obtaining Meaningful PLINK Conditions
Three different SNP panels were tested in 26 domestic families derived from the domestic parental broodstock in order to determine the best ROH calling parameters to fit estimation of kinship coefficients between breeders. The relationship between the minimum number of SNPs to call a ROH and the total ROH length retrieved supports that PLINK finds more ROH with datasets derived from more relaxed QC criteria and when considering fewer SNPs per sliding window by having more SNPs (Supplementary Figure 1). Also, the total sum of ROH changed dramatically by allowing 1 heterozygote SNP per PLINK's sliding window.
In order to assess the outcome of allowing 1 or none heterozygous SNP per PLINK's sliding window, the mean total length of ROH per family and kinship degree was plotted considering eight ROH size classes (Supplementary Figure 2). For each dataset, it is possible to see how by allowing 1 heterozygous SNP per sliding window a larger proportion of long ROH (>12 Mb) is systematically found in those families classified as full-siblings or half-siblings crosses by the IBD analysis. By not allowing any heterozygous SNP, these long ROH are broken, and thus, the number of medium-sized ROH is substantially increased (Ceballos et al., 2018a). The golden standard procedure to call for ROH in humans and livestock (Howrigan et al., 2011;Ferenčaković et al., 2017;Ceballos et al., 2018a) allows 1 or more heterozygous SNP, depending on the sequencing technology and marker density, in order to cope with calling errors and missing data that can mistakenly break ROH, thus increasing the power of detecting IBD.
Furthermore, it is shown that there are small differences between the strict (18,198 SNP) and the medium (21,615 SNP) QC criteria when 1 heterozygous SNP per sliding window is allowed, while large differences are detected with the very relaxed QC criterion (25,511 SNP; Supplementary Figure 2). Relaxed QC criteria enables SNP calling errors, since more relaxed QC procedures allow SNP calling errors associated with paralogous sequences or presence of null alleles (Maroso et al., 2018), capable of generating medium size ROHs. All in all, the use of the 21,615 SNP dataset through a sliding window of 37 SNP and allowing 1 heterozygous SNP seems to be the best fitting parameters for ROH analysis in our case.

Obtaining Accurate Estimates of the Genomic Inbreeding Coefficient (F ROH )
An exploratory analysis was addressed to assess the effect of the different ROH size thresholds (ROH > 1 Mb, > 2 Mb, > 4 Mb, > 8 Mb, and > 10 Mb) on F ROH estimation fitting to IBD (Supplementary Figure 3). Considering the three QC SNP dataset and 1 or 0 heterozygous SNP, the best fitted outcome was achieved when ROH longer than 1 Mb are considered θ IBD (Table 1 and Supplementary Figure 3). Furthermore, Table 1 supports that the best conditions for ROH-calling and accurate F ROH coefficient fitted to IBD are achieved when a medium QC criterion (21,615 SNP), 1 heterozygote SNP allowed per window and ROH longer than 1 Mb are applied. The size of a ROH is influenced by different factors such as recombination or linkage disequilibrium (Peripolli et al., 2016), although it has been found to be approximately correlated with its age: longer ROH will be inherited from recent ancestors, while shorter ROH will be from distant ancestors, as ROHs are broken down by recombination across generations (McQuillan et al., 2008;Marras et al., 2014). In this sense, it has been observed in other species (mammal livestock and humans) that very recent consanguinity, like that produced in an incestuous mating, generates very long ROH, longer than 8 Mb (Ceballos et al., 2018a;Goszczynski et al., 2018). The coefficient of variation (CV) is strikingly smaller when comparing intercepts and slopes of different QC procedures with 1 heterozygous SNP allowed per window. These CV are also smaller when using 1 Mb as ROH size threshold in comparison to longer cut-offs.
However, there is not a perfect correlation between θ IBD and F ROH , the coefficient of variation of F ROH being higher when considering non-inbred families (0 < θ IBD < 0.0076; Supplementary Table 5). Moreover, when comparing the F ROH and the θ IBD interval (considering the medium QC criterion and 1 Het), non-inbred families showed higher differences compared with more inbred ones. Thus, when non-inbred families are considered (Figure 1), the θ IBD estimates of each family's parents show extremely higher differences with the median of the F ROH box-plot than in more inbred families (Figure 1). The discrepancies found among non-inbred families might be related to the misclassification of some parents as fully unrelated, when a certain degree of ancestral consanguinity could occur in the domestic broodstock (do Prado et al., 2018a). Even with these differences, it is possible to conclude that the Frontiers in Genetics | www.frontiersin.org FIGURE 1 | Genomic inbreeding estimated through ROH (F ROH ). F ROH (y-axis) was obtained using a 1 Mb ROH cut-off for each family. Family codes are presented in x-axis and grouped by their parental kinship coefficients θ IBD as follows: UR; unrelated (0.0076 < θ IBD ), FC; first cousin (0.038 < θ IBD < 0.0937), HS; half-sibling (0.0937 < θ IBD < 0.1872) and FS; full-sibling (θ IBD > 0.1872). Red triangles represent the θ IBD of each family's parents.
F ROH (for ROH > 1 Mb) of the offspring correlates the best (R 2 = 0.78) with their parent's θ IBD when using the medium QC criterion, a PLINK's sliding window of 37 SNP and 1 heterozygous SNP allowed.

Genomic Distribution of ROH and ROHi
Large differences were found in ROH distribution between the wild and domestic turbot populations either considering the parental broodstock or additionally including all offspring. This is in agreement with the presence of inbred breeders in the broodstock according to kinship estimates (Figure 2). In the wild population, nearly no ROH were found among the size categories considered and, in fact, ROH longer than 2 Mb were not found at all (Figure 2). As expected, due to low ROH-identification, ROHi were not detected in the wild population. It would be interesting to explore smaller ROH segments (<1 Mb), to complement ZHp analysis in this population. Ideally this should be carried out using the same SNP panel in both the wild and farm populations and if possible, a higher density SNP panel, which is not the case for our study. These are the expected results in a large panmictic population such as Atlantic turbot (do Prado et al., 2018b). Higher marker density would be needed to identify shorter ROH that are IBD, ideally through whole-genome re-sequencing (Marras et al., 2014). In the broodstock population, an important number of ROH were identified, especially with sizes below 1 Mb (which were not considered for F ROH estimation), followed by 4 < ROH < 8 Mb (Figure 2). An atypical amount of medium size-ROH (4−8 Mb) was detected across turbot families, particularly in full-siblings. This observation makes sense if one considers the recent history of turbot aquaculture. Assuming that ROH length correlates with the number of generations elapsed since the common FIGURE 2 | Runs of homozygosity sum distribution for each ROH size category in the domestic and wild populations. ROH < 1 Mb were not considered for F ROH estimation.
The genomic distribution of ROHi might be useful to pinpoint candidate genomic regions under selection, as a preliminary step to search for candidate genes related to productive traits using genome mining (Ku et al., 2011;Goszczynski et al., 2018). The uneven distribution of ROHi observed across the turbot genome supports this idea and clustering of small to large ROHi were detected at particular regions of chromosomes C01, C03, C04, C13, C18, among others. In some cases, various consecutive ROH islands appeared interrupted by small genome stretches (e.g., on chromosome C13), which could suggest interleaved homozygous haplotype deficiency regions associated with deleterious/lethal haplotypes (Pausch et al., 2015). A significant occurrence of lethal recessive alleles was previously reported by Martínez et al. (2008) using diploid gynogenetic families. In the broodstock population, ROH can be further used to investigate genome signatures of selection, contributing to dissect the genetic architecture of quantitative traits and to map genes of interest for breeders. In turbot, the differences observed in ROHi-distribution could point to genomic regions closely linked to genes mainly under selection for growth rate; as these regions tend to exhibit ROHi and lower genetic diversity than the average genome (Mastrangelo et al., 2017). The ROHi distribution might also be associated with the recombination rate landscape across the genome, since ROH hotspots might be associated with genomic regions with low recombination rate. By contrast, ROH are scarcely expected in genomic regions FIGURE 3 | Runs of homozygosity islands genome distribution in the broodstock population. Each ROH island (ROHi) is represented by red segments across turbot (S. maximus) chromosomes. associated with higher fitness (Mastrangelo et al., 2017), such as those harboring genes involved in immune robustness, which have been reported to exhibit high genetic diversity in wild populations (e.g., major histocompatibility complex; Worley et al., 2010;Sutton et al., 2011).

Genetic Diversity and Selective Sweeps
Genome-wide genetic variation within and between populations is the raw material for evolutionary change, selective breeding and sustainable management of genetic resources (Akagi et al., 2016;Brito et al., 2017). The genomic screening in our study rendered 20,823 and 24,889 37-SNP-wide windows for the domestic broodstock and the wild population, respectively. Average Hp was higher in the domestic broodstock (0.27; range: 0.14-0.38) than in the wild population (0.19; range; 0.07-0.31). This is a rather unexpected outcome considering the processes of genetic drift associated with domestic populations, that likely has to do with the different RADseq-derived SNP panels used in both populations. The fraction of loci with rare alleles (MAF (minimum allele frequency) < 0.05) was much lower in the broodstock than in the wild SNP panels (29.62 vs. 53.46%, respectively; Supplementary Figure 4), thus strongly lowering heterozygosity estimates in the wild population. Despite limitations due to the ascertainment bias expected for different SNP panels, the loss of rare alleles in the broodstock sample would agree with genetic drift processes associated with founder effects and breeding practices on the finite turbot broodstock (Luikart et al., 1998;do Prado et al., 2018a;Maroso et al., 2018).
The less strict ZHp criteria (ZHp ≥ 2.5 and ZHp ≤ −2.5) represented about 1.28% (319 windows) and 1.34% (278 windows) of the total sum of windows in the wild and broodstock populations, respectively (Supplementary Figure 5). Windows surpassing the ZHp threshold of 2.5 were cataloged as suggestive high-genetic diversity (high-GD) regions, whereas those with ZHp below −2.5 were considered as suggestive low-GD regions. Both suggestive high-GD and low-GD regions were unevenly distributed across the turbot genome: as such, some chromosomes did not harbor any retained windows (chromosomes C03 and C20 in the broodstock; chromosomes C04, C07, C13, C15, C16 and C22 in wild turbot; Figure 4 and Supplementary Figure 6).
Genomic regions in the vicinity of loci subjected to directional selection showed specific signatures that facilitate the identification of selective sweeps (Sun et al., 2014;Brito et al., 2017;Zwane et al., 2019). Conversely, high genetic diversity has been reported in genomic regions associated with fitness, such as those harboring loci involved in immune robustness (Mastrangelo et al., 2017). The genome-wide scanning of the turbot genome revealed regions with higher and lower genetic diversity (GD) than the average across the genome. Forty five and 71 suggestive (less strict ZHp threshold) windows in the wild and broodstock populations, respectively, matched to 8 (chromosomes C01, C05, C08, C10, C19, and C20) and 12 (chromosomes C02, C05, C06, C08, C09, C12, C19, and C22) respective genomic regions including functional markers (QTL and/or adaptive outlier loci). One marker in chromosome C08 (Sma-E99) was found in a high-GD region in both the domestic and wild populations (Figure 4). These data targeted a total of 19 candidate genomic regions with low-GD and high-GD, suggestive to be under selection, which harbor QTLassociated markers for productive traits related to breeding programs, and/or outlier loci related to adaptive variation in wild turbot (Figure 4 and Table 2). Among the selected significant windows under strict ZHp criteria, 13 wild and 11 broodstock windows matched to two (chromosomes C01 and C05) and one (chromosome C02) genomic regions including functional markers. These data targeted three candidate genomic regions showing low-and high-GD that harbored QTL-associated markers for productive traits related to breeding programs and/or outlier loci related to adaptive variation, thus likely to be under selective pressure (Figure 4 and Table 2).

Integrative Genome Atlas of Genetic Diversity and ROHi
An integrative genome atlas of GD in the broodstock, including the ROHi and low/high-GD regions detected, was developed FIGURE 4 | Genome-wide distribution of Z-transformed pooled heterozygosity (ZHp) in the broodstock and wild populations. ZHp estimates were obtained using 37-SNP sliding windows (colored dots) across turbot chromosomes. ZHp thresholds are represented as dotted horizontal lines (ZHp > 2.5/3.0, red; ZHp < -2.5/3.0; blue). Relevant markers included inside strict low and/or high genetic diversity regions are displayed as black triangles; those inside suggestive low and/or high genetic diversity regions are displayed as black dots.
in this study. Different genomic stretches were highlighted due to the presence of low-/high-GD, ROHi and/or QTL/adaptiveassociated markers. Another atlas for low-/high-GD regions in the wild turbot population was also obtained, although no ROHi were found. Both genomic atlases were anchored to the turbot genome sequence (Maroso et al., 2018) and are shown in Supplementary Figure 7. These data represent useful landmarks to identify, prioritize and investigate putative signatures of selection in the species as a basis for further gene-specific research. In this sense, the application of genome mining and functional enrichment analyses can be useful for uncovering of candidate genes and pathways underlying suggestive genomic regions subject to selection.
In the broodstock population, a significant low-GD region in C02 was associated with a ROHi and a growth-related QTL marker (Sma-USC223; Sánchez-Molano et al., 2011;Sciara et al., 2018). This region was expanded under a less restrictive ZHp threshold and harbored other closely linked growth-related QTL markers (Sma-USC50 and Sma-E183). These three markers were tightly linked to important growth-related genes, such as fgf6 and lep, and their association with growth was further validated in families from different turbot breeding programs (Sciara et al., 2018). Moreover, another locus linked to balancing selection between Baltic and Black Sea population, consistent with parallel adaptation to salinity (7415_42, do Prado et al., 2018b), was also identified within this significant low-GD region at C02. Some suggestive low-GD regions detected under a less restrictive ZHp threshold were also pinpointed by the positional mapping of functional markers in the broodstock population. Among them, a suggestive low-GD region overlapping with a ROHi and a QTL-associated marker for P. dicentrarichii resistance in turbot (Sma-USC38) was identified on C20. Functional enrichment in immune-related genes, such as tfe3, mst1r, myb and fos was previously documented Figueras et al., 2016). Another interesting and previously unexplored region with ROHi and low-GD stretches was detected in the broodstock population. This region, located in chromosome C07 (Supplementary Figure 7), was functionally enriched in terms and pathways associated with the membrane attack complex, complement activation and  regulation, and leukocyte/lymphocyte-mediated immunity. Two particularly relevant immune-related genes were mined in this region: c8a and c8b; coding for the complement component C8 alpha and beta chains, respectively. C8 is a key component of the complement cascade which participates in the formation of the Membrane Attack Complex and directly interacts with pathogens as part of innate immune pathways, thus, likely under pathogen-induced selective pressure (Webb et al., 2015). Finally, one suggestive high-GD region associated with an adaptive-marker, which had been poorly studied to date, was mined in the turbot genome to explore underlying candidate genes and pathways. This high-GD region is located at C06 in the broodstock population (Supplementary Figure 7) and linked to a domestication-associated marker in this species (4559_39; do Prado et al., 2018a). The region showed functional enrichment suggesting a role in response to hypoxia (e.g., ahcy and angpt4 genes linked to methylation regulation and angiogenesis, respectively) and phosphate compound metabolic processes (e.g., alpl involved in bone mineralization and linked to skeletal defects). Hypoxia tolerance, when interacting with water temperature and salinity, has a major impact on fish physiology, including growth and immune response in aquaculture conditions (Lund et al., 2017), and has been associated with selection signatures in the channel catfish (Sun et al., 2014). Genome mining of this region also revealed other immune-related genes, such as ifsf21a, which belongs to a family of membrane receptors, and il17rd, which encodes a component of the interleukin receptor 17 and associated with immune response against bacterial pathogens in fish mucosal tissues Ding et al., 2016).
In the wild population, the two detected regions under strict ZHp criteria included relevant functional markers that were associated with genetic differentiation between farmed (two origins, ORI1 and ORI2) and wild turbot populations (do Prado et al., 2018a). One marker (1412_35; ORI2) was identified in a high-GD region at chromosome C01: This region harbors some key genes associated with the TCA cycle, such as sdha and mdh2. Sdha encodes a major catalytic subunit of succinateubiquinone oxidoreductase and has been previously suggested as a gene under long-term balancing selection/heterosis in humans (Baysal et al., 2007). Mdh2 catalyses the reversible oxidation of malate to oxaloacetate and has been described in Drosophila as an example of balancing selection (Peng et al., 1991). The other marker (13637_50; ORI1) was identified in a low-GD region at chromosome C05, harboring genes like stard5, trappc3 and txlng involved in intracellular traffic and regulation of cellular cycle, the latter one (taxilin gamma) being identified as a target of natural selection in human and Great Apes (Zhao et al., 2019). An interesting suggestive high-GD region at chromosome C08 was identified in the wild population under a less restrictive ZHp threshold (Figueras et al., 2016). This region harbors a relevant functional marker (SmaUSC-E23) closely linked to several immune-related genes and was associated with resistance to major pathogen threats for turbot production (Aeromonas salmonicida and Philasterides dicentrarichii; Rodríguez-Ramilo et al., 2011Martínez et al., 2016).

CONCLUSION
Two SNP datasets with similar marker density from a domestic and a wild turbot population were used to characterize the genome-wide distribution of runs of homozygosity (ROH) and genetic diversity (GD). After fitting ROH-calling parameters, large differences were found in ROH distribution between the broodstock and wild turbot populations, bringing insights into the history of domestication and selection in this species. The study provides useful information to estimate genomic inbreeding and identify selective sweeps in turbot. Genomic signatures of selection were identified by checking the overlapping distribution of ROHi and extreme genetic diversity regions across the genome, along with the presence of functional markers associated with QTL or adaptive variability in turbot. Genome mining and functional analysis on some selected regions provided candidate genes and pathways potentially explaining the action of selection in turbot populations. The atlases of genome diversity in broodstock and wild turbot populations in this study represent useful information for implementing further genomic research and breeding applications in turbot.

DATA AVAILABILITY STATEMENT
All datasets generated for this study are included in the Supplementary Material.

ETHICS STATEMENT
The animal experimental procedures carried out in this study were approved and conducted in strict accordance with the Directive 2010/63/EU of the European Parliament and of the Council of 22 September 2010 on the protection of animals used for scientific purposes.

AUTHOR CONTRIBUTIONS
OA participated in data collection, data analysis and interpretation, and drafting the manucript. FC participated in data analysis and interpretation. AC participated in data collection, data analysis and interpretation. AL, JH-H, and DB participated in data collection and critical revision of the manucript. CB and PM participated in the conception, drafting and critical revision of the manucript.