Construction of a High-Density Genetic Map Based on Large-Scale Marker Development in Mango Using Specific-Locus Amplified Fragment Sequencing (SLAF-seq)

Genetic maps are particularly important and valuable tools for quantitative trait locus (QTL) mapping and marker assisted selection (MAS) of plant with desirable traits. In this study, 173 F1 plants from a cross between Mangifera indica L. “Jin-Hwang” and M. indica L. “Irwin” and their parent plants were subjected to high-throughput sequencing and specific-locus amplified fragment (SLAF) library construction. After preprocessing, 66.02 Gb of raw data containing 330.64 M reads were obtained. A total of 318,414 SLAFs were detected, of which 156,368 were polymorphic. Finally, 6594 SLAFs were organized into a linkage map consisting of 20 linkage groups (LGs). The total length of the map was 3148.28 cM and the average distance between adjacent markers was 0.48 cM. This map could be considered, to our knowledge, the first high-density genetic map of mango, and might form the basis for fine QTL mapping and MAS of mango.


INTRODUCTION
Mango (Mangifera indica L., 2n = 40) is a major fruit crop of the tropics and subtropics, and is often called the "king of fruits" (Mukherjee, 1950(Mukherjee, , 1957Purseglove, 1972). Mango has been referred to as an allopolyploid (Mukherjee, 1950), but the recent microsatellite marker studies of Duval et al. (2005), Viruel et al. (2005), and Schnell et al. (2005) indicate that M. indica is diploid. Although mango has a high level of heterozygosity, it has a relatively small genome size (approximately 439 Mb; Arumuganathan and Earle, 1991). In recent years, the area under mango plantation has expanded unceasingly worldwide, and mango yields have increased year by year. Currently, more than 100 countries and regions worldwide produce mango, and China is one of the major mango-producing countries. The mango industry has become the mainstay of the local agricultural industry in most of the mango production provinces in China.
Mango breeding has gained increasing attention in recent years, and conventional breeding methods (cross breeding, mutation breeding, and seedling selection) are mainly used for creating new varieties of mango. However, these methods are time-consuming and difficult to improve upon because of some inherent characteristics of mango (Iyer and Schnell, 2009). Many important economic traits of mango are controlled by a number of loci. Quantitative trait locus (QTL) analysis and marker assisted selection (MAS) can shorten selection times and accelerate the breeding process of new varieties of mango. Genetic maps, especially high-density genetic maps, are particularly valuable tools for QTL mapping and MAS. Four genetic maps of mango have been generated to date. Chunwongse et al. (2000) first constructed a genetic map of mango using 197 restriction fragment length polymorphism (RFLP) and 650 amplified fragment length polymorphism (AFLP) markers based on 31 F 1 plants from a cross between Alphonso and Palmer cultivars of mango (M. indica). Kashkush et al. (2001) developed the second genetic map of mango using 34 AFLP markers based on 29 F 1 individuals and covering a length of 161.5 cM with an average marker spacing of 4.75 cM. Fang et al. (2003) developed the third genetic map of mango using 81 AFLP markers based on 60 F 1 individuals and covering a length of 354.1 cM with an average marker spacing of 4.37 cM. Chunwongse et al. (2015) constructed a partial genetic linkage map spanning a distance of 529.9 cM and consisting of 9 microsatellite and 67 RFLP markers using the same plant materials as in their previous report (Chunwongse et al., 2000). However, these maps were developed on the basis of either a small mapping population (60 or fewer) or a limited number of markers, resulting in a relatively low-density genetic map for future QTL analysis.
DNA markers such as AFLP, RFLP, random amplification of polymorphic DNA (RAPD), inter simple sequence repeat (ISSR), sequence-related amplified polymorphism (SRAP), and simple sequence repeat (SSR) have been developed in mango. However, the current number of markers is too small to build a high-density genetic map. Specific-locus amplified fragment sequencing (SLAF-seq) technology is an efficient method of de novo single nucleotide polymorphism (SNP) discovery and large-scale genotyping, which is based on reduced-representation library (RRL) and high-throughput sequencing . The efficiency of SLAF-seq was tested using rice and soybean data, and has been used to construct the highest-density genetic map of common carp, without a reference genome sequence. To date, SLAF-seq has been used successfully to construct highdensity genetic maps and study the genomes of many crops (Chen et al., 2013;Huang et al., 2013;Zhang et al., 2013Zhang et al., , 2015Li et al., 2014;Qi et al., 2014;Wei et al., 2014;Guo et al., 2015;Jiang et al., 2015;Xu et al., 2015;Zhu et al., 2015).
We conducted a hybridization breeding study of mango 10 years ago and established many segregating populations. Following several years' field investigation, we selected an intraspecific cross of M. indica for constructing its genetic linkage map. We employed the recently developed SLAF-seq approach to identify a large number of SNP markers for mango and thereby developed a high-density genetic map of mango. The characteristics and value of this genetic map were analyzed and discussed.

Plant Materials
The F 1 mapping population consisted of 173 individuals from a cross between M. indica "Jin-Hwang" (female parent) and M. indica "Irwin" (male parent) grown at the South Subtropical Crops Research Institute, Zhanjiang, China. The Jin-Hwang mango has a large, high-quality, and greenish yellow fruit. An important trait of the Jin-Hwang mango is its resistance to anthracnose, to which it is exposed frequently during its cultivation and postharvest storage, and severely affects the development of the mango industry (Lei et al., 2006). The Irwin mango has a medium-sized, high-quality bright yellow fruit with a crimson red blush. However, it is susceptible to anthracnose (Campbell, 1992;Lei et al., 2006).
Young leaves from the parents and F 1 individuals were collected and genomic DNA was isolated by the cetyltrimethylammonium bromide (CTAB) method (Kashkush et al., 2001). The genomic DNA was visualized by electrophoresis in agarose gel and quantified using a NanoDrop 2000 Spectrophotometer (Thermo scientific, USA).

SLAF Library Construction and High-Throughput Sequencing
The SLAF-seq strategy  was used in this study. The genomic DNA of the two parents and F 1 population was digested using Hpy166II restriction enzyme [New England Biolabs (NEB), USA]. Subsequently, Klenow Fragment (3 ′ → 5 ′ exo-) (NEB) and dATP were used to add a single nucleotide (A) overhang to the digested fragments at 37 • C. T4 DNA ligase was used to ligate the duplex tag-labeled sequencing adapters (PAGEpurified; Life Technologies, USA) to the A-tailed fragments. PCR was carried out using diluted restriction-ligation DNA samples, Q5 R High-Fidelity DNA Polymerase (NEB), dNTPs, the forward primer (5 ′ -AATGATACGGCGACCACCGA-3 ′ ), and the reverse primer (5 ′ -CAAGCAGAAGACGGCATACG-3 ′ ). The PCR products were then purified using Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) and pooled. The pooled samples were separated using 2% agarose gel electrophoresis. Fragments ranging from 264 to 464 bp (with indexes and adaptors) in size were excised and purified using a QIAquick gel extraction kit (Qiagen, Hilden, Germany). Gelpurified products were then diluted. Paired-end sequencing (125 bp from both ends) was performed using an Illumina HiSeq 2500 system (Illumina, Inc., San Diego, CA, USA) according to the manufacturer's instructions.

Sequence Data Grouping and Genotyping
SLAF marker identification and genotyping were performed using the procedures described by Sun et al. (2013). Low-quality reads (quality score <20 e) were deleted and the raw reads were assigned to 173 individuals samples according to the duplex barcode sequences. After trimming the barcodes and the terminal 5-bp positions from each high-quality read, the clean reads were clustered together according to their sequence identities. Sequences mapping to the same locus with over 90% identity were defined as one SLAF locus (Zhang et al., 2015). SNP loci between the two parents were detected, and the SLAFs with >3 SNPs were removed. Alleles of each SLAF were defined according to parental reads with sequence depth >10-fold and offspring reads with sequence depth >2-fold. As M. indica is a diploid species, one SLAF locus can contain maximum 4 genotypes; therefore, the SLAF loci with >4 alleles were eliminated. Only SLAFs with 2, 3, or 4 alleles were found to be polymorphic and considered potential markers. All polymorphism markers were grouped into eight segregating patterns. As the map was constructed using the F 1 population from two heterozygous parents, the markers from the segregation pattern of aa × bb were filtered out.
In order to ensure the genotyping quality, genotype scoring was conducted using a Bayesian approach as described by Sun et al. (2013). Subsequently, three steps were taken to screen the high-quality markers. First, the markers whose average sequence depths were <10-fold in parents and <6-fold in progeny were filtered. Second, the markers with >10% missing data were removed. Third, the markers with significant segregation distortion (P < 0.05) based on the chi-square test were initially excluded from the genetic map construction and then added later as accessory markers.

Genetic Map Construction
Marker loci were partitioned primarily into linkage groups (LGs) by modified logarithm of odds (MLOD) scores >5. For efficient map construction, the HighMap strategy was used to arrange the SLAF markers in a specific order and correct genotyping errors within LGs . The genetic map was constructed according to the maximum likelihood method (van Ooijen, 2011) and the genotyping errors were corrected by the SMOOTH algorithm (van Os et al., 2005). The missing genotypes were imputed by using a k-nearest neighbor algorithm (Huang et al., 2011). The Kosambi mapping function was applied to estimate the genetic map distances in centimorgan (cM; Kosambi, 1943).

Analysis of SLAF Sequencing Data and Genotyping
After preprocessing, 66.02 Gb of raw data containing 330.64 M reads were generated. The raw data of SLAF-seq were submitted to the Sequence Read Archive (SRA) database (accession number: SRX1741570) of the National Center of Biotechnology Information (NCBI). On an average, Q30 (quality scores of at least 30, indicating a 1% chance of error) was 85.60% and the GC content was 36.55%. The numbers of reads for female and male parents were 9,403,617 and 10,407,769, respectively. The read numbers for each F 1 individual ranged from 1,178,049 to 2,363,336 with an average of 1,796,718 (Table S1). After read clustering, a total of 318,414 SLAFs were detected, and their average sequencing depth was found to be 25.35-fold and 6.54fold for parents and each progeny, respectively (Figure 1).
Among the 318,414 SLAFs, 156,368 were polymorphic with a polymorphism rate of 49.11%, while the remaining 162,046 were non-polymorphic or repetitive. After filtering the SLAF markers lacking the parent information, 125,815 polymorphic markers were successfully genotyped and grouped into eight segregation patterns (ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc, and cc × ab; Figure 2). Since the two parents were heterozygous, the markers from the segregation pattern of aa × bb were filtered out.

Basic Characteristics of the Genetic Map
After a series of screenings, 7394 SLAF markers were found to be effective and used for the final linkage analysis. The average integrity of the mapped markers was 99.90%, indicating a relatively high quality of the genetic map. After linkage analysis, among the 7394 markers mapped onto the 20 LGs, 4866 markers were used for the female map, 2585 for the male map, and 6594 for the integrated map (Figure 3, Tables S2-S4). The total genetic lengths of the female, male, and integrated maps were 3144.23, 2747.89, and 3148.28 cM, respectively. The average distances between the adjacent markers in the female, male, and integrated maps were 0.65, 1.07, and 0.48 cM, respectively. All LGs are shown in Table 1. For the integrated map, the longest LG was LG19 (206.90 cM), which contained 520 SLAF markers, while the shortest LG (LG6, 119.44 cM) harbored 255 SLAF markers.
LG16 had the maximum markers (591), whereas LG8 possessed the minimum markers (219). On an average, each LG contained 330 SLAF markers. The genetic length of the LGs ranged from 119.44 cM (LG6) to 206.90 cM (LG19), with an average distance between the adjacent markers ranging from 0.35 cM (LG16) to 0.8 cM (LG8). The "Gap ≤ 5" value, which reflected the degree of linkage among markers, ranged from 95.15 to 99.49%, with the largest gap of 18.734 cM located in LG9. A total of 13,844 SNP loci were identified among the 6594 mapped SLAF markers ( Table 2).

Quality Evaluation of the Genetic Map
Quality of the mango genetic map was evaluated by constructing the haplotype and heat maps. The haplotype maps, which reflect the double exchange of the population, were developed for the parental controls and 173 offsprings using 6594 SLAF markers (Supplementary Material Presentation 1). Most of the recombination blocks were distinctly defined. The missing data for each LG ranged from 0.04% (LG1) to 0.22% (LG8). Most of the LGs were uniformly distributed, suggesting that the genetic maps were of high quality.
The heat maps showed the relationships of recombination between markers from each LG. Pair-wise comparisons between markers were used to assign recombination scores to 6594 markers, after which the heat maps were constructed (Supplementary Material Presentation 2). The resulting maps showed that the order of SLAF markers in most of the LGs was correct.

Segregation Distortion Markers on the Map
Of the 6594 markers, only 174 (2.64%) exhibited significant segregation distortion (P < 0.05) on the genetic map on the basis of a chi-square test ( Table 1). The segregation distortion markers were distributed on most of the LGs with the exceptions of LG10, LG17, LG19, and LG20. The frequencies of distorted markers on LG3 (16.09%) and LG5 (12.64%) were higher than those of the other LGs. No significant correlation was observed between the distribution of the distorted and mapped markers. For example, LG16, which possessed the maximum markers (591 SLAF markers) and covered 205.69 cM, included only one distorted marker. Comparatively, LG8, which possessed the

DISCUSSION
The development of abundant and reliable molecular markers was very important for the construction of a genetic map. In the present study, SNP markers were used for map construction. SLAF-seq technology is an efficient method of de novo SNP detection and large-scale genotyping, which is based on RRL and high-throughput sequencing . The markers developed using SLAF-seq technology were of relatively higher density, better consistency and effectiveness, and lower cost than those developed using traditional methods. Since the development of SLAF-seq technology, it has been applied in many plant studies and has produced remarkable results. Zhang et al. (2013) employed SLAF-seq technology to detect 71,793 high-quality SLAFs, of which 3673 were polymorphic; they used 1233 of the polymorphic markers to construct the first high-density genetic map for sesame. Qi et al. (2014) applied SLAF-seq to develop 12,577 polymorphic SLAFs, and used 5308 of them to construct a soybean genetic map. Similar studies have been conducted on many other crops, such as wax gourd (22,151 polymorphic SLAFs;Jiang et al., 2015), tea plant (25,014 polymorphic SLAFs; Ma et al., 2015), walnut (49,174 polymorphic SLAFs; Zhu et al., 2015), mei (93,031 polymorphic SLAFs; Zhang et al., 2015), grapevine (42,279 polymorphic SLAFs; Guo et al., 2015), red sage (62,834 polymorphic SLAFs; Liu et al., 2016), and cucumber (15,946 polymorphic SLAFs;Zhu et al., 2016). In this study, 66.02 Gb of raw data containing 330.64 M reads were generated through SLAF-seq. There were a total of 318,414 SLAF markers, of which 156,368 were polymorphic. After a series of screenings, 7394 SLAF markers were found to be effective and were used for linkage analysis; however, only 6594 markers containing 13,844 SNP loci were mapped successfully onto the high-density genetic map. The average sequencing depth of the markers that failed to be included in the map was 535,794, and the average integrity of all samples was 95%, which suggested that this failure was not due to a sequencing problem. The polymorphic markers of mango developed by SLAF-seq technology were more numerous than those of the previously mentioned crops, further demonstrating the potential of SLAFseq as a low-cost technique to effectively develop numerous and reliable molecular markers for fruit trees.
Genetic maps are the basis of QTL analyses of the agronomic traits, which are important for the improvement of breeding programs. However, the construction of genetic maps is difficult, and there is a shortage of genetic maps for fruit trees, especially the tropical ones. There were a few reports for papaya (Sondur et al., 1996), avocado (Sharon et al., 1997), mango (Chunwongse et al., 2000(Chunwongse et al., , 2015Kashkush et al., 2001;Fang et al., 2003), and guava (Padmakar et al., 2015) with respect to genetic maps. Although four genetic maps have been previously reported for mango, there were several deficiencies in them. Firstly, the maximum number of F 1 individuals used for these studies was 60. If the size of the population used for map construction is too small, it cannot reflect small recombination rates, and is not beneficial for increasing the efficiency of QTL detection. Secondly, the dominant molecular markers, such as AFLP, were mainly employed, whereas the co-dominant markers, such as SSR, were rarely used. Thirdly, the number of mapped markers was relatively small, which resulted in relatively large average distances between the adjacent markers. The progress of the genetic map construction was very slow, and to the best of our knowledge, no high-density map has been reported to date. The total length of the new genetic map, which was constructed using 6594 SLAF markers in this study, was 3148.28 cM spanning 20 LGs , and the average distance between adjacent markers was 0.48 cM. This map could be considered the first high-density genetic map of mango, and its evaluation using haplotype and heat maps suggested that it was of high quality (Supplementary Material Presentations 1, 2).
Some of the seedling trees of the F 1 progenies of the two cultivars of mango began to set fruit two years ago, and the phenotypic data of the two parents and F 1 progenies were collected. The characteristic of disease resistance was segregated. Since the Jin-Hwang mango is resistant to anthracnose and Irwin mango is susceptible to anthracnose, the cross between the two cultivars might be used to breed a new variety with high anthracnose resistance and enable the study of the inheritance of resistance. The high-density genetic map of mango constructed here will be applied to map the QTL/QTLs of this trait.  Additionally, our sequencing results provided a mass of SLAF markers for mango. Because they were explored at the wholegenome level, the sequence and location information of these markers can be used as a reference in the future genome assemblies of mango. Furthermore, a total of 13,844 SNPs, which are co-dominant markers, were found among the SLAF markers, and they can be used for comparative genomic studies and future MAS in mango breeding program. Like other molecular markers, the SLAFs and SNPs developed here also can be applied to other mango cultivars and progenies for identification of the germplasm or hybrid and analysis of the genetic diversity among cultivars.

AUTHOR CONTRIBUTIONS
Conceived and designed the experiments: CL, BS, and QY. Performed the experiments and analyzed the data: CL, BS, and HW. Wrote the manuscript: CL, BS. Read and approved the final manuscript: WX, SW.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2016. 01310 Supplementary Material Presentation 1 | Haplotype map of the integrated maps. Each row represents a marker. Markers are ranked in accordance with the map order. Each of the two columns represents an individual; blank columns are used between two individuals. The first and second columns represent the paternal and maternal chromosomes, respectively. The green and blue areas in the columns represent the first and second alleles from the parents, respectively. The white column represents the source of alleles that cannot be judged. The gray areas represent the deleted alleles.

Supplementary Material Presentation 2 | Heat map of the integrated maps.
Markers of each row and column are ranked according to the map order; each small square represents the rate of recombination (r) between the two markers.
Table S1 | Summary of the sequencing data. Table S2 | SLAF markers and SNP loci on the map. The SNPs within the SLAF marker allele sequences were marked in lowercase letters. The SLAF markers separated by linkage groups are shown in different sheets and the parent from which the alleles were derived are shown.