Development and Validation of 58K SNP-Array and High-Density Linkage Map in Nile Tilapia (O. niloticus)

Despite being the second most important aquaculture species in the world accounting for 7.4% of global production in 2015, tilapia aquaculture has lacked genomic tools like SNP-arrays and high-density linkage maps to improve selection accuracy and accelerate genetic progress. In this paper, we describe the development of a genotyping array containing more than 58,000 SNPs for Nile tilapia (Oreochromis niloticus). SNPs were identified from whole genome resequencing of 32 individuals from the commercial population of the Genomar strain, and were selected for the SNP-array based on polymorphic information content and physical distribution across the genome using the Orenil1.1 genome assembly as reference sequence. SNP-performance was evaluated by genotyping 4991 individuals, including 689 offspring belonging to 41 full-sib families, which revealed high-quality genotype data for 43,588 SNPs. A preliminary genetic linkage map was constructed using Lepmap2 which in turn was integrated with information from the O_niloticus_UMD1 genome assembly to produce an integrated physical and genetic linkage map comprising 40,186 SNPs distributed across 22 linkage groups (LGs). Around one-third of the LGs showed a different recombination rate between sexes, with the female being greater than the male map by a factor of 1.2 (1632.9 to 1359.6 cM, respectively), with most LGs displaying a sigmoid recombination profile. Finally, the sex-determining locus was mapped to position 40.53 cM on LG23, in the vicinity of the anti-Müllerian hormone (amh) gene. These new resources has the potential to greatly influence and improve the genetic gain when applying genomic selection and surpass the difficulties of efficient selection for invasively measured traits in Nile tilapia.


INTRODUCTION
Nile tilapia (Oreochromis niloticus) is an important fresh-water aquaculture species farmed in more than 100 countries including many developing countries in which the species is an essential source of dietary protein (ADB, 2005). Thanks to its fast growth, short generational interval (5 months), relatively small size, adaptability to different environments, and easy to handle, it is also used as a model species for research in fish endocrinology (Seale et al., 2002), physiology (Wright and Land, 1998;Vilela et al., 2003), and evolutionary and developmental biology (Fujimura and Okada, 2007).
Nile tilapia production is supported by more than 20 breeding programs based mainly in South East Asia and some in Africa and America (Neira, 2010). Most of the commercial and farmed Nile tilapia strains are derived from the genetically improved farmed tilapia (GIFT) base strain established in the early 1990s (Eknath et al., 1993). Among these, the Genomar Supreme Tilapia (GST R ) strain which has undergone more than 25 generations of selection.
So far Nile tilapia breeding programs have relied on traditional breeding approaches based on easily measurable phenotypes such as weight and length, and have just recently started to implement modern genome-based strategies, such as marker-assisted and genomic selection (personal communication). Compared to livestock species, aquaculture has been slower to adopt genomebased selection tools largely due to a lack of genomic resources such as reference genomes, SNP arrays, and linkage maps. But in species like rainbow trout, salmon, and common carp where genomic selection is being practiced, priority of utilizing genomic information is on selection for disease and parasitic resistance. For example, resistance against Bacterial Coldwater Disease (BCWD) (Vallejo et al., 2015a(Vallejo et al., ,b, 2017b, infectious pancreatic necrosis (IPN)  and Piscirickettsia salmonis (Yoshida et al., 2017a) in rainbow trout; Piscirickettsia salmonis (Bangera et al., 2017) and resistance against sea lice (Ødegård et al., 2014;Tsai et al., 2016;Correa et al., 2017a) in Atlantic salmon; Piscirickettsia salmonis in coho salmon (Barría et al., 2018); and juvenile growth rate in common carp (Tsai et al., 2015b;Palaiokostas et al., 2018). Similarly, lots of GWAS studies have been conducted in these species, primarily for disease resistance (Correa et al., 2015;Liu et al., 2015;Palti et al., 2015;Vallejo et al., 2017a;Barría et al., 2018), resistance against sea lice Correa et al., 2017b), sexual maturity (Ayllon et al., 2015;Gutierrez et al., 2015) and some carcass quality traits (Sodeland et al., 2013;Tsai et al., 2015a;Gonzalez-Pena et al., 2016;Yoshida et al., 2017b).
The first genome assembly for O. niloticus (released in 2011; Orenil1.0, and updated to Orenil1.1 at the end of 2012 (NCBI, 2018)) was based on short-read sequencing. A newer assembly (O_niloticus_UMD1) was generated using a combination of novel long-reads (generated using Pacific Biosciences Technology) and publicly available Illumina short reads (Conte et al., 2017). Four linkage maps of varying resolution were constructed using markers found with Restriction-site Associated DNA (RAD) sequencing (Palaiokostas et al., 2013), microsatellites and/or AFLP markers (Kocher et al., 1998;Lee et al., 2005;Guyon et al., 2012). The RAD based strategies usually generate a SNP resource of medium density and are highly efficient in species where a reference genome is not available (Robledo et al., 2017). In comparison, a SNP-array offers the advantages of increased genotype accuracy of much higher numbers of markers as well as control over the physical distribution of these across the genome (Robledo et al., 2017). In this paper, we report the development of a 58K SNP-array (Onil50) and construction of a high density linkage map in the commercial strain of Nile tilapia, Genomar Supreme Tilapia (GST R ), which is the continuation of the widespread GIFT-strain.

Origin of Sequenced Fish
The GST R strain of Nile tilapia used in this study originates from the original GIFT population (Eknath et al., 1993). This strain was selected for growth from generation 1 to 14, growth and filet yield from generation 15 to 19, and growth, yield, and robustness from generation 20. Thirty-two individuals (13 males and 19 females) from this population were selected for whole genome sequencing. Twenty of them are from generation 23, selected at random from the breeding nucleus and the rest 12 are from a commercial line formed from generation 20 and selected for growth (Supplementary Figure 6). The graphical summary of the methodology is given in Figure 1.

Whole Genome Sequencing and SNP-Detection
Genomic DNA from these 32 individuals was extracted from fin-clips (preserved in Ethanol) using Qiagen DNeasy columns (Qiagen, Germany). DNA quality was assessed by agarose gel electrophoresis and quantified using a Qubit fluorometer (Thermo Fisher Scientific, United States). After normalization, sequencing libraries were prepared and barcoded using TruSeq sample preparation kit and sequenced (2 × 125) across 10 lanes on an Illumina HiSeq 2500 (Illumina, United States) by a commercial provider. At the time this work was carried out, Orenil1.1 Tilapia represented the highest quality reference genome available (NCBI Assembly Oreochromis niloticus: GCF_000188235.2_Orenil1.1_genomic), and reads were aligned to it using BWA-MEM algorithm in BWA 0.7.12 (Li, 2013) with default parameters. Putative SNPs were identified using FreeBayes v0.9.20 (Garrison and Marth, 2012) with parameters genotype-qualities and experimental-gls. Using vcffilter, SNPs with a QUAL score value (phred) of ≤20 were removed.

SNP-Filtering
The initial set of putative SNPs was divided into three groups including SNPs located on scaffolds assigned to linkage groups (LGs) of the assembly, SNPs on unassigned scaffolds, and SNPs detected within the mitochondrial genome. SAMtools v1.2/bcftools (Li et al., 2009) was then used to filter out variants according to the following criteria: a SNP was removed if; (i) located within 5 bp to an indel, (ii) had more than one alternative allele, (iii) the sequencing depth exceeded 700 reads, (iv) its alleles were A and T, or C and G (these require twice as many 'probes' on Affymetrix SNP arrays as other SNP allele combinations), (v) if sample genotype quality (GQ) was <30, (vi) minor allele frequency (MAF) <0.05, (vii) all samples were heterozygous for the given SNP, (viii) the variant was detected in fewer than 28 of the samples sequenced. Finally, Hardy-Weinberg Equilibrium (HWE) exact test was calculated using PLINK2 (Chang et al., 2015) and SNPs that showed departure from HWE (P < 0.05) were removed.

SNP-Selection
After filtering, 2.76 million putative high-quality SNPs remained. Based on their relationship to genes and physical distribution, Frontiers in Genetics | www.frontiersin.org a subset of these was identified for inclusion on the array. SNPEff v 4v1l (Cingolani et al., 2012) was used to identify SNPs with high and moderate effects (including for example nonsynonymous variants). The SNPs with high effects are assumed to have disruptive impact in the protein codification, probably causing protein truncation, loss of function or triggering nonsense mediated decay, e.g., stop gained, frameshift variant, etc.; whereas the SNPs with moderate effects are assumed to be nondisruptive, but they might change protein effectiveness, e.g., missense variant, inframe deletion, etc. From the list of almost 38,000 variants with high and moderate effects, approximately 10,000 were chosen avoiding SNPs within 10 kb of another. A python script was used to fill in gaps and produce a relatively even distribution of SNPs selected at ≈12 kb intervals across the 22 LGs, and ≈33 kb across unmapped scaffolds >50 kb in length. The script was designed to fill a distribution gap with a variant falling within a small size selection window with highest MAF being the main criteria. SNPs from the mitochondrial genome were selected manually. The selected subset of SNPs (n = 56,050) were submitted to in silico validation for Affymetrix Bioinformatic Service. The Affymetrix in silico probe set design and evaluation pipeline predicts the performance of SNPs and calculates a conversion probability value (p-convert value: representing the probability of a given SNP converting to a reliable SNP assay on the Axiom array system) using various criteria including: binding energy, GC content, and the expected degree of non-specific hybridization to multiple genomic regions. Based on the p-convert values, they classify the SNPs into different categories: recommended, neutral, not-recommended, etc. 46,877 SNPs under the categories recommended and/or neutral from probe scoring recommendations were retained and 24,349 extra SNPs were chosen from the regions were the SNPs were discarded. A total of 71,226 SNPs (46,877 + 24,349) were sent back to Affymetrix for in silico SNP validation. Finally, the best 58,466 SNPs were chosen to tile on the array based on their probe scoring recommendation (at least one of the strand were recommended, or got neutral category). Upon its release, SNP positions were redefined based on the O_niloticus_UMD1 assembly (Conte et al., 2017) using NCBI's Genome Remapping Service.

Construction of Genetic Map
Genotyping Genomic DNA was isolated from ethanol-preserved fin clips collected from 1882 Nile tilapia samples using Qiagen 96 DNeasy Blood & Tissue Kits according to manufacturer's instructions (Qiagen, Germany) for map construction. These samples were from different generations of GST R strain within the same breeding population. After quantification and quality checking of DNA, samples were genotyped on the Onil50 array at Center for Integrative Genetics (CIGENE) in Norway.
The unfiltered dataset contained 58,466 SNPs, which were analyzed using the Best Practices Workflow with default settings (sample Dish QC ≥ 0.82, QC call rate ≥97; SNP call-rate cutoff ≥97) in the Axiom Analysis Suite software. Ten samples were excluded from analyzed dataset because of the low call rate. Then, the SNPs classified as PolyHighResolution or NoMinorHom [most informative categories from Best Practices Workflow in Axiom Analysis Suite software (Thermo Fisher Scientific Inc, 2018)] were selected, leaving us with 43,014 SNPs.

Family Structure
The 1872 genotyped individuals could be divided into two groups based on the generations of the breeding population. Group 1 (n = 1124) comprised individuals collected following the branching of the 20th generation, and were factorially crossed against each other after two generations. The experimental design for Group 1 is described in Joshi et al. (2018) and was primarily intended to partition the non-additive genetic effects in this population. Fish from Group 2 (n = 748) were obtained from the 24th and 25th generations of GST R (Supplementary Figure 6).
Parentage assignment was done using an exclusion method which eliminates animals from a list of potential parents when there are opposing homozygotes between parents and offspring (Hayes, 2011). We used all the 43,014 SNPs and permitted a maximum of 100 conflicts between parents and offspring, representing approximately 0.24% of all genotypes. A total of 689 offspring was divided among 41 full-sib families containing ≥8 offspring (mean offspring per family, µ = 16.81). Group 1 (468 offspring with 19 parents) had 34 full-sib families (µ = 13.77 ± 5.5) and Group 2 (221 offspring with 14 parents) had 7 full-sib families (µ = 31.57 ± 7.23). The structure of Groups 1 and 2 is shown in Supplementary Tables 1, 2.

Linkage Map Construction
The SNPs displaying a MAF ≤0.05 (2,466 SNPs) were further filtered for linkage map construction. All the retained SNPs (n = 40,548) had SNP call rate >0.97, so this criteria was not used for filtration. Phenotypic sex was known for a subset of families (221 offspring + 33 parents) and was coded as 12 for males and 11 for females and included in the genotype file (n = 40,549) before running Lepmap2 (Rastas et al., 2013) for linkage map construction. Lepmap2 uses information from full-sibs and their parents to assign SNP markers to LGs, and applies standard hidden Markov model (HMM) to compute the likelihood of the marker order within each LGs. First, the SNPs were used to construct the preliminary linkage map (Build 1), which was used to anchor, order, and orient the scaffolds in the O_niloticus_UMD1 assembly and upgrading this assembly to O_niloticus_UMD_NMBU (Conte et al., 2018). Eventually, the final physical integrated genetic linkage map (Build 2) was constructed from the order of the markers based on the physical position of the SNPs in O_niloticus_UMD_NMBU assembly.

Build 1: To Anchor SNPs to Different LGs
SeparateChromosomes (a module in Lepmap2) was run testing lodLimits from 1 to 50 and a sizeLimit = 100; a lodLimit of 10 resulted in 22 LGs, also with the lowest number of markers not assigned to any LG. JoinSingles was run to assign single markers to LG groups and tested with lodLimits from 1 to 15 and lodDifference = 2; a lodLimit of 4 was selected as this joined the highest number of single markers. OrderMarkers was used to order the markers within each LG. Each LG was ordered separately and replicated 5 times with commands: numThreads = 10, polishWindow = 30, filterWindow = 10, useKosambi = 1, minError = 0.15, and the order with highest likelihood was selected as the best order. For sex averaged map OrderMarkers was run similarly by adding sexAverage = 1.

Build 2: Integrated Linkage Map Based on the Order of the SNPs in the New Assembly
Sequence containing each SNP was used to find the physical position of the SNPs in the O_niloticus_UMD_NMBU assembly. Physical position information was used to adjust the order of the SNPs within respective LGs and Lepmap2 was rerun to produce the final linkage map.

Array and SNP-Performance
To get a more comprehensive overview about the array and SNP performance, 3119 additional Nile tilapia samples were genotyped using Onil50 array. The raw dataset of the 1872 samples which were used for linkage mapping was combined with the dataset of the 3119 samples and were analyzed together using the Best Practices Workflow with default parameters in Axiom Analysis Suite software (Thermo Fisher Scientific Inc, 2018). Four quality parameters were assessed on those samples filtered through Dish QC (DQC ≥ 0.82), QC call rate (QC CR ≥ 97) and plate QC (Percent of passing samples ≥50 and average call rate for passing samples ≥50) criteria: MAF, SNP call rates, Hardy Weinberg (HW) p-values, and clustering. SNPs could be divided into six different types on the basis of formation of clusters (i) "PolyHighResolution" -formation of three clusters with good resolution; (ii) "NoMinorHom" -formation of two clusters with no samples of one homozygous genotype; (iii) "MonoHighResolution" -a single cluster of a homozygous genotype; (iv) "OTV, " off-target variants -three good clusters, with a single additional off-target cluster caused by variants in the SNP flanking region; (v) "CallRateBelowThreshold" -the SNP call rate was below the threshold (0.970), but other cluster properties were above the threshold; and (vi) "Other" -the SNPs were not grouped into any of the previous categories.
SnpEffv4.3i (Cingolani et al., 2012) was used to predict functional effects of the 58,340 SNPs which were remapped to O_niloticus_UMD1 assembly.

SNP Selection and Array Development
The sequencing of 32 Nile tilapia generated 4.22 × 10 9 reads representing an average of 17.7× coverage per individual (stdv = 4.2, min = 9.4, and max = 27.7). After alignment, on average 98% of reads were mapped to the Orenil1.1 assembly yielding 12.78 million variants of which 10.5 million were putative SNPs. Rest 2.2 million were insertions, deletions, multinucleotide polymorphisms, etc. After performing multiple steps of filtering described in the section "Materials and Methods, " a subset of 2.76 million SNPs was retained and a final set of 58,466 SNPs were selected for assay design and printed on the Onil50 array.
Around 99.8% of the SNPs from the array were successfully re-mapped to the new O_niloticus_UMD1 assembly (Table 1). Remapping revealed an increase in the number of SNPs mapping to LGs and a corresponding decrease in the number of SNPs on unmapped scaffolds. The average variant density per LG on the Orenil1.1 assembly is 12.5 ± 0.35 kb (12.1-13.7 kb). However, since the O_niloticus_UMD1 assembly includes an additional 87 Mb assigned to LGs the average variant density increased to 15.5 ± 4.06 kb (13.3-32.3 kb) ( Table 1). Additional information about inter-SNP distance and standard deviation can be found in Supplementary Table 4. Physical size of LG03 increased by 2.4 times in the new assembly, thereby increasing the number of SNPs assigned to this LG by 2.3 times.

Performance and Validation of the SNPs in the Array
A total of 4947 samples out of 4991 passed the Dish QC threshold. A total of 4858 samples (97.3%) were left after being subjected to sample call rate. Based on the cluster profile, over 74% of the 58,466 SNPs were classified as PolyHighResolution. More detailed information about the sample and SNP statistics are shown in Figure 2.
Prediction of functional effects of the 58,340 remapped SNPs from the Onil50 array resulted, in most cases, in multiple annotations per variant. The effects with the highest putative impact are included for summary in Table 2. The majority of the SNPs are intronic (36.92%) or intergenic (20.80%) variants, while about 15% of are non-synonymous mutations. Since the SNP selection process specifically targeted variants with a potential functional effect, these variants are expected to have direct effect on traits of interest.

Linkage Map
A total of 40,549 SNPs were retained following quality filtering, and 99.78% of these (n = 40,467) were ordered within the 22 LGs corresponding to the karyotype of Nile tilapia (Supplementary Figure 1). Since, Build 1 linkage map is an intermediate step for the extension of the O_niloticus_UMD1 to the O_niloticus_UMD_NMBU genome assembly (Conte et al., 2018), which is not the aim of this paper, we give only a brief summary of the results. The genetic and physical maps were generally in good agreement with a correlation of ≥0.96 between the reference genome position and the genetic map position of the SNPs (Supplementary Figure 1). This high correlation with the physical map demonstrates that the genetic map is of high quality and is highly accurate.
A total of 40,186 SNPs mapped to 22 LGs in Build 2 linkage map. The consensus (sex-averaged) map adds up to 1469.69 cM, with individual LG lengths ranging from 56.04 cM (LG19) to 96.68 cM (LG07) ( Table 3). The average genetic distance across the LGs was 66.8 cM. The number of markers per LG varied from 1349 to 3391, with an average of 1827 markers per LG ( Table 3). As a consequence of the SNP selection, which sought to position a SNP every 12 kb, the number of markers was mostly proportional to the size of the LG (Figure 3). A notable exception is LG03 where the inclusion of previously unassigned   scaffolds has trippled the physical size without a corresponding tripling of SNP numbers. The SNP density (SNPs/Mb) varied across the genome from 19.68 to 56.83 (see also Figure 4 and Supplementary Figures 2-4).
In this study, paternal and maternal informative markers were used to construct specific male and female maps ( Table 3). Around one-third of the LGs showed a different recombination rate between sexes (Supplementary Figure 5), with male and female map lengths differing by a factor of 1.2 (1359.6 and 1632.9 cM, respectively). Generally, female maps were found to be larger in all LGs with the exception of LG02, LG06, and LG22. Sigmoidal pattern of recombination, with no recombination at both ends of the LGs, was seen in almost all LGs (Figure 4).

High-Density Linkage Map for Nile Tilapia
Existing linkage maps for Nile tilapia contain relatively few markers unevenly distributed across LGs (Supplementary Table 3). As a consequence, regions in the genome have poor SNP coverage. By stringently selecting SNPs with an even physical distribution in the genome the linkage map presented includes 10 times more SNPs and fewer gaps, compared to the most recent map (Palaiokostas et al., 2013). Ferreira et al. (2010) categorized the karyotypes of O. niloticus into 3 meta-submetacentric and 19 subtelo-acrocentric chromosomes. The steepness of the curve in Figure 4 shows the recombination level, with flat lines representing little or no recombination, which may suggest the possible location of the centromeres. There are some discontinuities present in LG03 and LG15, suggesting that our linkage map lacks the SNPs in those regions. These gaps might be due to missing sequence in the assembly or due to the assembly errors. Wide recombination deserts (areas with no recombination) are seen in the initial and/or end regions of most of the LGs, generally up to 5 Mb and sometimes up to 10 Mb (e.g., LG09 and LG10), indicating the presence of subtelo-acrocentric chromosomes. Because of these recombination deserts, most of the LGs, irrespective of the sexes, showed sigmoidal pattern, which is unusual when compared to other fish species. In channel catfish (Li et al., 2014), salmon (Tsai et al., 2015b), Asian seabass (Wang et al., 2015) and stickleback (Roesti et al., 2013) the recombination rates were generally elevated toward the end of the LGs. The possible explanation might be that the GST R strain used in this study is derived from the GIFT strain, formed from crossing among four wild and four cultured Asian strains (Eknath et al., 1993). When the individuals from two different strains are crossed together, the offspring is heterozygous and causes difficulty in recombination producing stretch with low recombination, which might have resulted to the unique recombination pattern. Low recombination at the end of LGs were also observed in the hybrid crosses of Lake Malawi cichlids (Conte et al., 2018). Nile tilapia was shown to have a sex-specific pattern of recombination with the female map generally being larger than the male map (Lee et al., 2004). The genetic basis for the differences in the recombination rate between sexes has still not been found, but Li et al. (2014) has listed three major hypotheses. First, the selection perspective hypothesis (Lenormand and Dutheil, 2005;Gruhn et al., 2013), proposes that the selection pressure is higher in male compared to female gametes during the haploid life stage and this male-specific selection leads to decrease in the male recombination rate to maintain the beneficial haplotypes. Second, the compensation hypothesis (Coop and Przeworski, 2007), proposes that the recombination rate is higher in females compared to males to compensate for the less stringent checkpoint for the non-recombinant (achiasmatic) chromosomes. Third, the recombination pathway hypothesis (Gruhn et al., 2013), suggests that the chromatin differences established prior to the onset of the recombination pathway causes the differences in the recombination between the two sexes.
LG23 showed a unique recombination pattern, a flat line of around 5 Mb, in the center of the LG, for which there also is a sex difference in recombination rate. In O. niloticus, major XY sex determining regions have earlier been mapped to LG1 (Palaiokostas et al., 2013) and LG23 (Karayücel et al., 2004;Shirak et al., 2006;Eshel et al., 2011Eshel et al., , 2012. Further, tandem duplication of the variants of the gene anti-Müllerian hormone (amh) in LG23 has been identified as the male sex determinant in Nile tilapia (Li et al., 2015). These variants of amh gene have been mapped to around 35.4 Mb region of Nile tilapia genome (discussed below in section "Sex Locus Mapped in the Vicinity of amh Gene"), which is the same region where the unique recombination pattern is seen, suggesting limited recombination around the sex-determining genes in O. niloticus. Further, LG23 was formed by the fusion of two LGs during the evolution of cichlids (Liu et al., 2013), which might be another reason for this unique recombination pattern.
The fusion of the LGs during the evolutionary process also has an effect on the size of the LGs, as it is believed that the ancestors of cichlids had 24 chromosome pairs, which eventually became 22 pairs (Majumdar and McAndrew, 1986;Ferreira et al., 2010). Our genetic map shows that LG07 is the largest, which has been shown to be formed by the fusion of two LGs during lineage evolution Liu et al., 2013;Conte et al., 2017).

Array Content and Performance
SNP performance was validated by genotyping around 5000 individuals from different generations of the GST R strain of Nile tilapia. Around 75% of the SNPs on the array perform well generating three highly differentiated allelotype clusters (i.e., PolyHighResolution). Around 9% of the SNPs were found to depart from HWE (p < 0.01), but it has to be noted that the population genotyped for the validation is the commercial strain that has undergone up to 25 generations of selection. Hence, these departures might be important as they could represent regions under selection, domestication and the outcome of assortative mating (Gutierrez et al., 2016;Adenyo et al., 2017). Whereas the extreme departures might suggest lethal recessive mutations and/or recent mutations or copy number variants (Lee et al., 2008;Graffelman et al., 2017).
For future revisions, the array could be improved by increasing the SNP density in highly recombinant regions of specific LGs like including LG03 and LG23. The use of genetic distance rather than the physical distance to select the SNPs is probably the best option for equidistant SNP distribution across the genome.

Sex Locus Mapped in the Vicinity of amh Gene
Sex determination is one of the important aspect in commercial tilapia production, as males are found to grow faster than females and unisex production is a main method to avoid propagation in production ponds or cages. Sex determination in fish is more complicated than mammals as it tends to be dependent on both genetic and environmental factors (Ezaz et al., 2006;Baroiller et al., 2009). Besides hermaphrodite species, two main sex determination system exist: XY and ZW, and they are both present in fish species. It has also been seen that phylogenetically The correlation between the genetic distance of SNPs (cM) on the linkage map and the physical distance (bp) according to the reference genome assembly. F, M, and SA represents female, male, and sex-averaged maps, respectively.
Frontiers in Genetics | www.frontiersin.org   (Mair et al., 1991;Campos-Ramos et al., 2003). In our study, the sex locus for Nile tilapia was coded using the XY system and mapped to LG23 (Table 4) as reported previously in several studies (Karayücel et al., 2004;Shirak et al., 2006;Eshel et al., 2011Eshel et al., , 2012. SNP AX-164998274 (SNP probe: AGGTGTGTGGTCTTTCTTTGGAAGTCTGCAGAGT G[C/T]TTCAATAACACAGGTATGGTTTCTCGTTGTGATTC) mapped to the same genetic position as the sex locus. The most  likely position of sex locus (pos. 34.5 Mb/40.53 cM on LG23) maps close to the anti-Müllerian hormone (amh) gene, previously characterized as sex determining gene in Nile tilapia (Li et al., 2015).

Implications in Tilapia Industry
Tilapia is a commercially important aquaculture species, with more than 3.9 million tons of fish and filets being traded in 2015 (FAO, 2017) and more than 20 breeding programs (Neira, 2010). The present SNP array and linkage map has the potential to greatly improve the genetic gain for this economic important species, and help surpass the difficulties of efficient selection for the invasively measured traits, the traits which cannot be measured directly on the candidate broodstock fish, but are only measured on the sibs of the candidates, e.g., disease resistance, filet yield, etc. These tools may also be useful to bridge the genotype-phenotype gap in Nile tilapia, which has been pursued for a long time (Gjøen, 2004). A major capability of these resources will be to find economic important QTLs or chromosome regions affecting economically important traits like disease resistance, filet traits or feed efficiency. In order to fine map these QTLs, it is essential to have a high-resolution linkage map. The dense linkage map can also be integrated with physical maps to position and orient scaffolds along LGs, thereby producing genome assemblies of higher quality.
Another important implication will be to facilitate the shift from traditional breeding strategies to genomic selection in Nile tilapia. In the future, breeding goals in Nile tilapia will include many invasively measured traits. Genomic selection will significantly help us to overcome these challenges, increasing the profitability and the genetic gain (Meuwissen et al., 2001;Nielsen et al., 2009;Sonesson and Meuwissen, 2009;Vela-Avitúa et al., 2015;Hosoya et al., 2017;Houston, 2017). Finally, this will also help to discern more accurately the additive from the non-additive genetic effects, thereby increasing the selection accuracy and the possibility to utilize non-additive genetic effects (Varona et al., 2018).
Another obvious use of the SNP-array will be in the parentage assignments. The drawback of the conventional breeding designs in Nile tilapia using PIT tags is the confounding of the fullsib family effects (due to communal rearing of full-sibs) and maternal environmental effects (due to mouth brooding), making it difficult to detangle the various variance components accurately , which ultimately decreases the accuracy of the selection.

CONCLUSION
We present the first SNP-array, the Onil50-array, containing ca 58,000 SNPs for Nile tilapia, which was validated in close to 5000 individuals. Further, we constructed a high density integrated genetic and physical linkage map, with LGs showing sex-differentiated sigmoidal recombination patterns. These new resources has the potential to greatly influence and improve the genetic gain when applying genomic selection and surpass the difficulties of efficient selection for invasively measured traits in Nile tilapia.

DATA AVAILABILITY
The assemblies used in this study can be found in NCBI using the following accessions: Orenil1.1 = GCF_000188235.2, O_niloticus_UMD1 = MKQE00000000, and O_niloticus_UMD_ NMBU = MKQE02000000. The whole genome sequence data used for SNP detection has been deposited in the European Nucleotide Archive and can be found in EMBL-EBI_website using accession PRJEB28330. Linkage map generated from this study can be found in the Figshare: https://figshare.com/s/ 8427b97cf6e623173232.

ETHICS STATEMENT
Sampling of DNA was done in accordance with the commercial practice and norms by Genomar Genetics AS.

AUTHOR CONTRIBUTIONS
HG, AA, and MK conceived and designed the study. AA coordinated biological sampling. MK and MÁ were responsible for array design and MÁ performed lab work and initial analysis of results. RJ constructed the linkage map, while SL integrated the genetic and physical maps. RJ and MÁ prepared the draft manuscript which was reviewed and edited by HG, MK, AA, and SL. All authors read and approved the manuscript.

ACKNOWLEDGMENTS
We would like to acknowledge Anders Skaarud from Genomar Genetics AS for his help in sample management. We would also like to thank Harald Grove, Torfinn Nome, and Tim Knutsen for their valuable advice and help with bioinformatics analyses of sequence and SNP data. The SNP array was developed in cooperation with Affymetrix, Inc., and we particularly thank the following Affymetrix personnel for their direct contribution: Lakshmi Radhakrishnan and Alessandro Davassi. We also thank Silje Karoliussen for her help in genotyping and raw data analyses. Similarly, we also acknowledge Solomon Boison and Luqman Aslam for their interaction during linkage map construction.