Construction of a High-Density Genetic Map and Analysis of Seed-Related Traits Using Specific Length Amplified Fragment Sequencing for Cucurbita maxima

Seed traits are agronomically important for Cucurbita breeding, but the genes controlling seed size, seed weight and seed number have not been mapped in Cucurbita maxima (C. maxima). In this study, 100 F2 individual derived from two parental lines, “2013-12” and “9-6”, were applied to construct a 3,376.87-cM genetic map containing 20 linkage groups (LGs) with an average genetic distance of 0.47 cM using a total of 8,406 specific length amplified fragment (SLAF) markers in C. maxima. Ten quantitative trait loci (QTLs) of seed width (SW), seed length (SL) and hundred-seed weight (HSW) were identified using the composite interval mapping (CIM) method. The QTLs affecting SW, SL and HSW explained a maximum of 38.6%, 28.9% and 17.2% of the phenotypic variation and were detected in LG6, LG6 and LG17, respectively. To validate these results, an additional 150 F2 individuals were used for QTL mapping of SW and SL with cleaved amplified polymorphic sequence (CAPS) markers. We found that two major QTLs, SL6-1 and SW6-1, could be detected in both SLAF-seq and CAPS markers in an overlapped region. Based on gene annotation and non-synonymous single-nucleotide polymorphisms (SNPs) in the major SWand SL-associated regions, we found that two genes encoding a VQ motif and an E3 ubiquitin-protein ligase may be candidate genes influencing SL, while an F-box and leucinerich repeat (LRR) domain-containing protein is the potential regulator for SW in C. maxima. This study provides the first high-density linkage map of C. maxima using SNPs developed by SLAF-seq technology, which is a powerful tool for associated mapping of important agronomic traits, map-based gene cloning and marker-assisted selection (MAS)-based breeding in C. maxima.


INTRODUCTION
The squash Cucurbita maxima Duch (2x = 2n = 40) is an important cucurbitaceous plant that is widely grown worldwide as a commercial crop. It has high nutritional value and healthprotective properties and thus has received increasing interest and popularity in breeding. Research on squash genetics and molecular biology has also progressed rapidly in recent years.
A high-density genetic map is not only a key resource for studies on genome structure and genetic relationships but also provides the basis for quantitative trait locus (QTL) mapping and marker-assisted selection (MAS) based on the numbers of polymorphic markers . Since the first genetic linkage map of the genus Cucurbita was constructed with isozymes in 1986 (Weeden and Robinson, 1986), several genetic maps have been developed based on molecular markers. The genetic maps of Cucurbita pepo have been constructed with random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR), sequence characterized amplified region (SCAR), and single-nucleotide polymorphism (SNP) markers and spanned 1954-2817.6 centimorgans (cM) (Brown and Myers, 2002;Zraidi et al., 2007;Gong et al., 2008a;Esteras et al., 2012;Monteropau et al., 2017). The high-density genetic map of C. pepo contained 7,718 markers with an average genetic distance between markers of 0.4 cM (Monteropau et al., 2017). The genetic maps of Cucurbita moschata were developed using SSR and SNP markers and spanned 1445.4-3087.03 cM (Gong et al., 2008b;Zhong et al., 2017). The high-density genetic map of C. moschata contained 3470 markers with an average genetic distance of 0.89 cM (Zhong et al., 2017). Compared with those of C. pepo and C. moschata, few maps of C. maxima with a limited number of markers have been published. The maps of C. maxima spanned 991.5-2566.8 cM, and the latest genetic map contained 458 markers with an average marker density of 5.60 cM (Singh et al., 2011;Ge et al., 2015;Zhang et al., 2015a). A genetic map with a low density of markers has limited application for further fine QTL mapping and MAS in squash breeding; therefore, a high-density genetic map of C. maxima is urgently needed.
Seed traits are important agronomic traits for squash breeders. Larger seeds can provide more nutrients for seedlings, which can result in increased resilience against environmental stress. Many artificial-selection-based studies of squash breeding have focused on increasing seed size and weight. Most published reports related to differences in seed size have focused on the diversity and evaluation of the combining ability (Nagar et al., 2017;Darrudi et al., 2018), whereas few reports have focused on gene locations. Four seed width (SW)-associated QTLs were located using 193 F 2 C. maxima individuals with logarithm of odds (LOD) values ranging from 3.49 to 4.22 and percent variance explained (PVE) values ranging from 2.87 to 29.68 cM (Tan et al., 2013). However, genes controlling seed length (SL), hundred-seed weight (HSW), and seed number per fruit (SNF) have not been mapped.
The QTLs of important seed traits in C. maxima have not been cloned. Therefore, a high-density linkage map with more informative and high-throughput markers is urgently needed. To construct a high-density SNP map and identify QTLs for significant seed traits in C. maxima, we used SLAF-seq technology for genotyping a population with 100 F 2 individuals derived from two inbred lines with different agronomic traits. With this map, QTLs associated with SW, SL, SNF and HSW were successfully identified in narrow candidate regions. To locate genes controlling SL and SW, QTL mapping with cleaved amplified polymorphic sequence (CAPS) markers was performed under different growing conditions and in different populations over a two-year period. Then, the coding genes mapped in the candidate region were predicted.

Plant Material and DNA Extraction
An F 2 mapping population with 100 individuals was derived from a cross between the high-generation inbred lines "2013-12" and "9-6" for SLAF-seq. An additional 150 F 2 individuals were selfpollinated to contrast with the F 2:3 family for locus verification using CAPS markers. Seedlings of the parental F 2 population and F 2:3 (10 plants per family) individuals were planted in a greenhouse at Xiangyang Base in Heilongjiang Province. Young leaves from the parents and F 2 populations were collected for DNA extraction with the cetyltrimethylammonium bromide (CTAB) method. All fruits of the F 2 and F 2:3 individuals were harvested 50 days after pollination, and the seeds were cleaned and dried for seed-related phenotype analysis. The SL, SW, SNF, and HSW were measured for each offspring.

SLAF Library Construction and
High-Throughput Sequencing cv. Rimu (Sun et al., 2017) was used to design SLAF markers using different enzymes. Then, a pilot SLAF experiment was performed, and the SLAF library was constructed in accordance with the predesigned scheme SLAF Predict (Peking Biomarker Technologies Corporation, Peking, China). For the F 2 population in our study, two enzymes (HaeIII+Hpy166II) (New England Biolabs, NEB, USA) were selected as the most appropriate enzymes to digest the genomic DNA. A single-nucleotide (A) overhang was subsequently added to the digested fragments using the Klenow fragment (3´!5´exo -) (NEB) and dATP at 37°C. Duplex tag-labeled sequencing adapters were ligated to the A-tailed fragments using T4 DNA ligase (ThermoFisher Scientific, MA, USA). PCR was carried out using 25-ml samples containing 100 ng of restriction-ligation DNA samples, 1× reaction buffer, 200 mM dNTPs, 0.5 U of Q5 High-Fidelity DNA Polymerase and 0.5 mM PCR primers (forward primer: 5'-AATGATACGGCGACCACCGA-3', reverse primer: 5'-CAAGCAGAAGACGGCATACG-3') (PAGE-purified, Life Technologies, China). The PCR program was as follows: 94°C for 30 s; 12 cycles of 94°C for 40 s, 65°C for 30 s, and 72°C for 30 s; and 72°C for 5 min. The products were purified using Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, UK) and pooled. Next, fragments ranging from 264 bp to 364 bp in length were purified using the QIAquick Gel Extraction Kit (Qiagen, Germany). Finally, the gel-purified products were diluted to 100 ng/ml, and paired-end sequencing was performed on the Illumina HiSeq 2500 system (Illumina, Inc.; San Diego, CA, USA) according to the manufacturer's recommendations.

Sequence Data Grouping and Genotyping
SLAF marker identification and genotyping were performed using procedures described by Sun et al. (2013). Briefly, lowquality reads (quality score < 20e) were filtered out, and then, the raw reads were sorted to each progeny according to the duplex barcode sequences. After the barcodes and terminal 5-bp positions were trimmed from each high-quality read, clean reads from the same sample filtered by SOAP software (Li et al., 2008) were mapped onto the C. maxima genome sequence (Sun et al., 2017). Sequences that mapped to the same position were defined as one SLAF locus (Zhang et al., 2015b). Then, SNP loci for each SLAF locus were compared between the parents; first, SLAF loci with more than 3 SNPs were filtered out. Alleles of each SLAF locus were defined according to parental reads with a sequencing depth >10-fold, whereas for each offspring, reads with sequencing depth >1-fold were used to define alleles. For pumpkin, one SLAF locus can contain at most 4 genotypes, so SLAF loci with more than four alleles were discarded subsequently. Only SLAFs with two to four alleles were identified as polymorphic markers and considered potential markers. The polymorphic SLAF loci were genotyped to ensure consistency between the parental and offspring SNP loci. Polymorphic markers were classified into eight segregation patterns. Only one segregation type (aa×bb) was used to construct the genetic map because the polymorphic markers were analyzed according to the F 2 population type. Genotype scoring was performed using a Bayesian approach to further ensure the genotyping quality (Sun et al., 2013). Low-quality markers for each marker and each individual were counted, and the worst marker or individual was deleted. The chi-square test was performed to examine the segregation distortion, and markers with significant segregation distortion (P < 0.05) were excluded from the map construction.

Linkage Map Construction and Evaluation
All high-quality SLAF markers were allocated into 20 linkage groups (LGs) based on their genomic locations. Then, the modified LOD (MLOD) scores between markers were calculated to further confirm the robustness of markers for each LG, and markers with MLOD scores < 5 were filtered prior to ordering. To ensure efficient construction of the highdensity and high-quality map, a newly developed HighMap strategy was utilized to order the SLAF markers and correct genotyping errors within the LGs . First, recombinant frequencies and LOD scores were calculated using a two-point analysis and applied to infer linkage phases. Then, enhanced Gibbs sampling, spatial sampling, and simulated annealing algorithms were combined for iterative marker ordering (Jansen et al., 2001;Van, 2011). The updated recombination frequencies were used to integrate the two parental maps, which optimized the map order in the subsequent simulated annealing cycle. A stable map order was obtained after 3-4 cycles, resulting in a high-quality map including 20 LGs. The SMOOTH error correction strategy was performed according to parental contribution to the genotypes (Os et al., 2005), and a k-nearest neighbor algorithm was applied to impute missing genotypes (Huang et al., 2011). Skewed markers were added to this map by applying a multipoint maximum likelihood method. Map distances were estimated using the Kosambi mapping function (Kosambi, 1944). Marker pairs with zero recombination in each LG had the same genetic distance. The data analysis script draw haplotype-map.pl (Peking Biomarker Technologies Corporation, Peking, China) was used to construct the haplotype map, and draw heatmap.pl (Peking Biomarker Technologies Corporation, Peking, China) was used to construct the heat map.

Relationship Between the Genetic and Physical Maps
The physical positions of the SNPs were determined based on alignment with the reference genome sequence of C. maxima (Sun et al., 2017). The collinearity between the genetic and physical positions was determined by plotting each marker's genetic position (in cM) against its physical position (in Mb) using Excel 2007 (Microsoft Corporation, WA, USA). Spearman's correlation coefficients were calculated using the Statistical Analysis System (SAS) program (ND Times, Peking, China).

CAPS of DNA Polymorphisms
To verify the QTL mapping results by SLAF-seq, CAPS markers surrounding the positioning region of the major QTLs associated with SL and SW were selected (at a physical distance from 1 Mb to 5 Mb in chromosome 6). Five hundred-bp sequences surrounding SNPs from the SLAF-seq data were used to design primers for CAPS markers. The PCR products were digested by a restriction endonuclease (i.e., EcoRI, HindIII, PstI, ScaI, BamHI, XhoI) (Thermo Scientific, MA, USA) to verify polymorphisms of the CAPS markers. The primers were designed using Primer Premier 6.0 software with the appropriate CAPS candidate sequences.
The PCR mixture for CAPS amplification contained 20 ng of plant genomic DNA, 10 pmol of the primers, 0.25 mM dNTPs, 10× Taq buffer, and 1 unit of Taq polymerase in a total volume of 20 ml. Touchdown PCR was performed for 7 min at 94°C, followed by 30 cycles of 30 s at 94°C, 30 s at 60°C with stepwise decreases of 0.5°C for each cycle, and 60 s at 72°C; 10 cycles of 30 s at 94°C, 30 s at 45°C, and 60 s at 72°C; and postheating for 7 min at 72°C. The reaction mixture for enzyme digestion contained 5 ml of the PCR product, 3.7 ml of ddH 2 O, 0.3 ml of the restriction enzyme (10 U/ml) and 1 ml of 10× enzyme buffer, which was incubated at 37°C for 3 h. The enzymedigested products were examined via 2% agarose gel electrophoresis. The primer sequences for the 11 polymorphic CAPS markers are listed in Table S1. The enzyme-digested products were examined via 2% agarose gel electrophoresis at 150 V for 30 min. The agarose gel electrophoresis results were photographed using the ChampGel 6000 gel documentation and image analysis system.

Phenotyping and QTL Mapping of Seed-Related Traits
The average SL and SW were measured for 10 randomly chosen seeds from each line with three replications, and SNF and HSW data were collected from 100 F 2 individuals in 2017 for QTL analysis with SLAF-seq. SL and SW were measured for each individual of an additional 150 F 2 individuals in 2017. SL and SW of 10 plants per F 2:3 family in 2018 were measured. The means of SL and SW within each F 2:3 family in 2018 were calculated and subjected to QTL analysis with CAPS markers.
The genotype of each mapped markers in 100 F 2 individuals was analyzed and markers with duplicate genotypes were removed. The R package ASMap (Taylor and Butler, 2017) was used to map QTLs, and QTL analyses were then performed using the R/qtl package with the composite interval mapping (CIM) model (Churchill and Doerge, 1994). The significance of each QTL interval was tested by a likelihood-ratio statistic (LOD) ( Van and Kyazma, 2004). For each trait, the LOD threshold for declaring significant QTLs was established separately with 1000 permutation tests (P = 0.05), which ranged from 2.8 to 3 for the various traits. To be conservative, an LOD score of 3 was used for the QTL detection of all traits. The QTLs were named according to the trait name and chromosome.

Characteristics Between the Two Parents
The crossing parents "2013-12" and "9-6" differed in several traits, including floral sex, fruit shape, fruit color, flesh thickness, flesh color, SL, SW, SNF, and HSW ( Figure 1 and Table S2). The female parent "2013-12" was subgynoecious with an orange-red fruit color, orange-yellow flesh, and small and light seeds. The male parent "9-6" was androecious with a gray fruit color, light-yellow flesh, and large and heavy seeds. The SL (20.35 mm) and SW (14.36 mm) of the F 1 plants were similar to those of the parental "9-6" line, but the SNF (189) and HSW (36.97 g) of F 1 were greater than those of the parental lines, with transgressions of 63.36% and 39.25%, respectively. Phenotypic data of SL, SW, SNF, and HSW were collected using 100 F 2 individuals in 2017 for QTL mapping with SLAF-seq. Phenotypic data of SL and SW were collected using 150 F 2 populations in 2017 and 150 F 2:3 individuals in 2018 for QTL analysis with CAPS markers. Detailed phenotypic and genotype data of the F 2 and F 2:3 populations are presented in Tables S2 and S3. Seed traits in 100 F 2 individuals present continuity variance and follow the normal distribution. According to seed trait data from the parents and 100 F 2 individuals, correlation analysis showed that SL was significantly correlated with SW (r = 0.627), which indicated that long seeds always appeared together with wide seeds.

SLAF-Seq Data and SNP Markers
DNA sequencing generated a total of 100.4 Gb of raw bases. Each read was ∼100 bp in length and 473.35 Mb paired-end reads were obtained after SLAF library construction and highthroughput sequencing. Of these reads, 28.81 Mb reads were from the male parent, 29.41 Mb were from the female parent, and an average of 4.15 Mb were from the 100 F 2 individuals ( Table 1). The raw data of the two parents and 100 F 2 individuals have been deposited to the National Center for Biotechnology Information (NCBI) and can be accessed in the Short Read Archive (SRA) sequence database under accession number PRJNA549786. Among these reads, 95.08% of the bases were of high quality, with quality scores of at least 30 (Q30). The average GC content was 40.83%, which suggested a normal distribution, and no AT or GC segregation was found in the reads. The sequencing depth of the 100 F 2 individuals ranged from 1.61-to 6.55-fold with an average sequencing depth of 3.20fold ( Figure S1), which revealed that the sequencing results were reliable for marker exploration.
A total of 584,994 SLAFs were developed, of which 195,305 showed polymorphisms, with a polymorphism rate of 33.39% ( Table S4). The number of SLAF markers per LG ranged from 17,334 (LG8) to 43,545 (LG4). A total of 178,376 high-quality SLAFs were detected in the offspring, of which the average sequencing depth was 29.01. In the comparison of the sequencing data from the two parents and offspring, a total of 1,437,901 SNPs were identified. Among these SNPs, the numbers of SNPs in "2013-12" and "9-6" were 690,456 and 587,028, respectively, and the average number of SNPs in each F 2 individual was 160,417 ( Table 2). The number of SNP markers per LG ranged from 29,881 to 77,881. SNPs with the same nucleotides for each allele depth were termed homo-SNPs. A total of 1,197,388 homo-SNPs were classified into eight segregation patterns following the CP group data format of Joinmap4.1 software (Van, 2011) and genotype encoding rules (Zhu, 2015) ( Figure S2). The 365,639 SNPs with an aa×bb segregation pattern in the F 2 population were selected to construct a linkage map because the F 2 population was derived from a cross between two homozygous parents with genotype aa or bb.

Genetic Mapping
Finally, 8,622 markers were used to construct the genetic map based on a sequencing depth in the parents of more than 10-fold, the segregation distortion criteria (P < 0.01), and less than 25% missing marker data in the F 2 population. After removing lowcollinearity markers confirmed by calculating MLOD scores between neighboring markers, 8,406 of the 8,622 SNP markers were mapped onto 20 LGs (Table 3, Figure S3). The 8,406 marker sequence and genotypes of markers in F 2 individuals are presented in Table S5 and Table S6. The genetic map spanned a total of 3376.87 cM, with an average distance of 0.47 cM between adjacent markers. The average number of markers in each LG was 420.3, and markers spanned an average length of 168.84 cM. The number of mapped SNP markers ranged from 134 (LG16) to 865 (LG11), and the average distance ranged from 0.21 cM (LG16) to 0.99 cM (LG11). LG12, which was the largest LG (196.35 cM genetic length), contained 490 markers and had an average marker density of 0.40 cM. LG9, which was the smallest LG (121.95 cM genetic length), contained 241 markers and had an average marker density of 0.51 cM. On the genetic map, the largest gap was 11.49 cM and was located in LG14. Gaps <5 cM constituted 99.30% of the total LGs, which showed good uniformity in the distribution.
The total number of reads for the markers used to construct the genetic maps in the two parents were 134,754 and 148,231, and the average depths were 16.03 and 17.63, respectively ( Table S7). The total marker depth in the F 2 population ranged from 24,382 to 428,849, with a depth of 216,408 in each offspring. The average marker depth in the F 2 population ranged from 6.11 to 51.13, with a depth of 26.24 in each offspring ( Table S8). The high marker depth could improve the accuracy of marker locations on the genetic map.

Evaluation of the Genetic Map
Haplotype and heat maps were used to evaluate the quality of the genetic map. Potential genotype and marker order errors can be FIGURE 1 | Different phenotype of two parents. A, C, E, and G are the plant type, fruit and seed phenotype of "2013-12". B, D, F, and H are the plant, fruit and seed phenotype of "9-6". (A) "2013-12" is subgynoecious type; (B) "9-6" is androecious type; (C) "2013-12" has orange-red fruits; (D) "9-6" has gray fruits; (E) "2013-12" has small and light seeds; (F) "9-6" has big and heavy seeds; (G) "2013-12" shows orang-yellow and thin flesh; (H) "9-6" shows light-yellow and thick flesh. reflected by a haplotype map of the genetic map. Haplotype maps were generated for each F 2 individual and the parental controls using SNP markers. More double crossovers indicated more genotype and marker order errors. As shown in Figure S4, most regions in the haplotype map of all F 2 individuals had a common origin, which indicated that a high-quality genetic map was constructed with the 8,406 SNPs. The relationship of recombination between markers from one LG was used to determine the potential ordering errors of the markers. Based on the heat map of the 20 LGs, a strong linkage relationship between two adjoining markers in the 20 LGs was distinctly visible ( Figure S5). As the distance between the two markers increased, the linkage relationship between them weakened, which illustrated the correct order of markers in the LGs.
The correlation of genetic and physical positions is an important factor for evaluation of genetic maps. Spearman's correlation coefficients between the genetic and physical maps for each LG were calculated to analyze the collinear relationships between the maps (Table S9). Spearman's correlation coefficient of each LG ranged from 0.98 to 1, which indicated high collinearity of the genetic and physical maps. As shown in Figure S6, marker arrangement on the genetic map was highly consistent with that on the physical map, which indicated good collinearity of the maps and high accuracy of genetic recombination on the genetic map.

QTL Analysis
Among 8,406 markers, 2,041 markers with different genotypes were used for QTL mapping. The LGs and positions in the LGs of 2,041 markers are listed in Table S10. According to the genotype of the F 2 population from the SLAF-seq data, the QTLs for SL, SW and HSW were distributed at 10 positions in six LGs, while no significant QTLs were detected in 20 LGs for SNF. The seed traits, LGs, position intervals, starting markers, ending markers, peak markers, LODs, PVEs, additive values, and dominance values are shown in Table 4 and Figure 2. QTLs that could explain more than 15% of the PVEs were considered major-effect QTLs. Four QTLs were detected for SL and were located in four LGs (LG4, LG6, LG17, and LG18). These QTLs explained 7.0-38.6% of the variance. The minor-effect QTLs SL4-1, SL17-1, and SL18-1 together explained 30.7% of the variance. The majoreffect QTL SL6-1 was situated between markers Marker157285 and Marker158784 (from 37.9 cM to 42.2 cM) with a peak at Marker158406, and explained 38.6% of the variance with an LOD value of 10.6. Four QTLs for SW were distributed over LG4, LG5, LG6, and LG8 and explained 6.9%-28.9% of the variance. The major-effect QTL SW6-1 was situated between markers Marker154765 and Marker156546 (from 28.4 cM to 33 cM) with a peak at marker Marker155361 and explained 28.9% of the variance with an LOD value of 7.9. Three minor-effect QTLs, SW4-1, SW5-1, and SW8-1, together explained 30.8% of the variance. Another two QTLs for HSW were located on LG6 and LG17. One major-effect QTL, HSW17-1, was situated between markers Marker424927 and Marker424340 (from 99.6 cM to 111.9 cM) with a peak at marker Marker424492, and explained Total SNP markers, total distances, average distances, maxs gap and Gap < 5 cM in each linkage group are shown. Numbers of the C. maxima in each LG are also shown. Gap < 5 indicated the percentages of gaps in which the distance between adjacent markers is smaller than 5 cM. 17.2 of the variance with an LOD value of 5.4. In addition, SL6-1, SW6-1, and HSW6-1 were located at the same LG.

Linkage Map Construction With CAPS Markers
According to the QTL results of SLAF-seq, the major QTL of SL and SW was located at a physical location of 28.4 cM to 42.2 cM in LG6 (with a physical distance of 1.82 Mb to 2.92 Mb). To verify the location results for SL and SW, 15 pairs of CAPS markers were designed from 1.00 Mb to 5.00 Mb in LG6. Eleven pairs of the CAPS marker showed clear, repeating, polymorphic bands for "2013-12", "9-6" and their F 1 progeny. The 500-bp sequences surrounding SNPs for eleven CAPS markers are presented in Table S11. CAPS production bands consistent with "2013-12" were marked as A, those consistent with "9-6" were marked as B, and those consistent with F 1 were marked as H. We genotyped 150 F 2 plants with these 11 markers (Table S3); the LOD profiles of the QTLs for the SL and SW traits are illustrated in Figure 3 and Table 5. The QTL SL6-1F2 was situated between markers M2374213 and M3507675, with an LOD score of 3.6 and a phenotypic variation of 22.2%. The QTL SW6-1F2 was identified between markers M1468248 and M1929605; this QTL accounted for 20.8% of the total variation and had an LOD score of 4.8. No significant QTLs were detected in the F 2:3 population for SL and SW, while the peak regions of QTL SL6-1F2:3 and SW6-1F2:3 were located in regions similar to those of the peaks of SL6-1F2 and SW6-1F2.

Predicted Genes for Seed Length and Seed Width
To efficiently select predicted genes controlling the SL and SW traits, the coding genes of the SL-associated region (from 2.52 Mb to 2.92 Mb) and SW-associated region (from 1.82 Mb to 1.93 Mb) in LG6 were analyzed. Protein structures and functions could be influenced by a change in a single nucleotide in the coding region. According to the SNPs and InDels from the SLAF-seq data, genes with missense codons between the two parents were selected, and twenty-two genes were identified in the candidate regions of the major QTLs for SL and SW. The gene ID and probable function information are listed in Table 6. Fifteen genes were annotated in SL-associated regions, and seven genes were annotated in SW-associated regions. According to seed-traits related gene descriptions (Sun et al., 2017), one gene encoding a VQ motif (CmaCh06G005530.1, from 2628780 bp to 2629112 bp in LG6) and one gene encoding E3 ubiquitin-protein ligase (CmaCh06G005450, from 2581752 bp to 2588335 bp) with a conserved domain regulates SL in other plant species (Song et al., 2007;Wang et al., 2010). Another gene encoding an F-box and leucine rich repeat (LRR) domains-containing protein (CmaCh06G004140.1, from 1917891 bp to 1923829 bp) with a conserved domain regulates SW in several plant species (Luo et al., 2005). The amino acid sequence alignment between "2013-12" and "9-6" for these genes is shown in Figure S7. The amino acid sequence differences in CmaCh06G005450 and CmaCh06G005530.1 between the two parental lines occurred in a conserved sequence (amino acid residues 5 to 333 aa and 14 to 75 aa, respectively). The amino acid sequence differences in CmaCh06G004140.1 between the two parental lines occurred in anN-terminal C2 conservative domain (11-140 aa) and a structural maintenance of chromosomes (SMC) -super family conservative domain (276-1026 aa), which could affect protein function. These results indicate that CmaCh06G005450, CmaCh06G005530.1, and CmaCh06G004140.1 might be good candidates to explain differences in SL and SW between the parents and offspring. Further experiments are needed to confirm the functions of these genes in Cucurbita.

DISCUSSION
SL, SW, SNF, and HSW have significant effects on squash yield and seed quality. Breeders prefer high seed yields and heavy seeds. The F 1 plants derived from a cross between the "2013-12" line and "9-6" lines exhibited over-parent heterosis in SNF and HSW. The "9-6" line could provide a new allele source for pumpkin breeding with large and heavy seeds. High-density genetic maps are a highly valuable tool for mapbased cloning and MAS. This study reports the construction of the first high-density linkage map of C. maxima using SNPs identified by SLAF-seq technology. When we compared the sequencing data from the two parents and offspring, a total of 584,994 SLAF markers and 1,437,901 SNPs were identified, and the average sequencing depth was 29.01. A number of markers will be beneficial for gene determinations and molecular cloning in pumpkin breeding. The genetic map studied herein spanned a total of 3,376.87 cM, and the linkage map in our study exhibited 20 LGs with approximately the same physical distances as the reference genome (Sun et al., 2017). It was indicated that the genetic map and SLAF markers in our study were of good quality. A total of 8,406 SNPs were used to construct a genetic map in 20 LGs with an average distance of 0.47 cM between adjacent markers, which meant that more markers were identified here than for C. maxima with 458 bin markers (Zhang et al., 2015b), C. moschata with 3,470 SNPs (Zhong et al., 2017) and C. pepo with 7,718 SNPs (Monteropau et al., 2017). The mean distance between markers in the genetic map in our study was shorter than those for C. maxima, with a mean marker density of 5.60 cM (Zhang et al., 2015b), and C. moschata, with a mean marker density of 0.89 cM (Zhong et al., 2017), and was almost as short as that for C. pepo, with a mean marker density of 0.4 cM (Monteropau et al., 2017). The published high-density genetic maps of Cucurbita were constructed using genotyping-by-sequencing (GBS) (Zhang et al., 2015b;Monteropau et al., 2017) and double-digest restriction-associated DNA (ddRAD) libraries (Zhong et al., 2017). GBS and ddRAD genotyping is performed by randomly interrupting genomic DNA with restriction enzymes, whereas SLAF-seq is performed by sequencing the paired ends of sequence-specific restriction fragments (Xu et al., 2015;Zhang et al., 2015b;Zhong et al., 2017). Therefore, SLAF-seq provides better repeatability and deeper sequencing than the above methods.
High-density markers and sites are necessary for map-based cloning of genes. In the genetic map of our study, LG9 was the largest LG (196.35 cM) and contained 490 markers, whereas LG12 was the smallest (121.95 cM) and contained 241 markers. On the map of C. maxima published by Zhang et al. (2015b), the smallest LG was LG10 (47.61 cM), which harbored 10 markers  on 1 oriented scaffold, and the largest LG was LG4 (268.00 cM), which harbored 41 markers on 2 oriented scaffolds. These two genetic maps of C. maxima showed little consistency in LG length. The evenly distributed SNPs and high-density sites studied herein were developed with SLAF-seq and HighMap methods. The haplotype map, heat map and Spearman correlation coefficients of the genetic map further ensured the SNP location accuracy in the LGs. On the published high-density linkage map of C. maxima, the distance between neighboring markers ranged from 0.05 cM to 44.89 cM, and 8 large gaps (≥18 cM) were detected in LG2, LG3, LG6, LG7, LG11, LG15, and LG19 (Zhang et al., 2015b). On the published high-density linkage map of C. moschata, the largest interval gap was 22.30 cM, and large gaps were detected in LG8, LG13, LG16, and LG19 (Zhong et al., 2017). The large gaps and low sequencing coverage have limited further fine mapping and MAS in Cucurbita. Therefore, additional markers and highcoverage genetic maps are needed. In our study, the largest interval gap was only 11.49 cM (in LG14), and gaps <5 cM constituted up to 99.30% of the total LG, which illustrated successful construction of a map with improved coverage. There may be two possible explanations for the high-coverage genetic map. First, the two elite parents used for this map presented significant differences in several traits (Figure 1), which indicated the existence of great genetic diversity between the parents. Second, to obtain SLAF markers with few repeated sequences that are evenly distributed in 20 LGs, the squash genome was used to design marker discovery experiments using different enzymes. Two enzymes were used to digest the genomic DNA in this experiment, which ensured that SLAF markers were effectively discovered on a genomic map.
Based on differences in seed trait composition between the two parents in our study, we identified a total of ten QTLs for SL, SW and HSW with the CIM method. Four QTLs were found for SL, four QTLs were found for SW, and two QTLs were found for HSW in our study. No significant QTLs for SNF were detected in our study, most likely because of a slight difference between the parents "2013-12" (116 SNF) and "9-6" (105 SNF). In cucumber and watermelon, 12 QTLs were found for SL with PVE values ranging from 2.20% to 28.90%, 12 QTLs for SW with PVE values ranging from 2.24% to 24.10%, and 10 QTLs for HSW with PVE values ranging from 4.50% to 28.30% (Wang et al., 2014;Zhou et al., 2016). In our study, major QTL SL6-1 was homologous to Chr3, with a physical distance of 28.29 Mb-32.43 Mb in cucumber (Qi et al., 2013), which is near the SSR22874 marker of SL3.1 (at a physical distance of 34.08 Mb) in cucumber (Wang et al., 2014). These results indicated that SL is controlled by homologous genes among species. Homologous regions of major QTLs for SW and HSW in cucumber or watermelon (Zhou et al., 2016) were located on different chromosomes or in different locations, in contrast to published QTLs in cucumber and watermelon (Wang et al., 2014;Zhou et al., 2016). Thus, multiple genes may control SW and HSW in different cucurbit species.
According to the seed size data, we found that long seeds always appeared together with wide seeds in the offspring in our study (Table S2). Further analysis showed that SL was significantly correlated with SW. The QTL of SL-1 was in a region similar to that of SW6-1 in the same LG, resulting in similar colocalization between SL and SW. It would benefit breeding for large seeds. In published studies of seed traits, the major QTLs for SL and SW in watermelon were colocalized in the same LG (Meru and McGregor, 2013;Zhou et al., 2016), which was similar to our results. Four significant QTLs for SL and four significant QTLs for SW were identified, and the flanking markers for these QTLs could be used in molecular MAS to increase the seed size in squash seedlings during the selection of hybrid offspring.
Quantitatively inherited traits can be influenced by the environment and mapping population. In our study, we detected seed size traits (seed length and seed width) with 100 F 2 populations by the CIM method from SLAF-seq data in 2017 (the related QTLs were named SL4-1 to SL17-1 and SW5-1 to SW8-1), 150 F 2 populations by CAPS markers in 2017 (the QTLs were named SL6-1F2 and SW6-1F2), and 150 F 2:3 individuals by CAPS markers in 2018 (the QTLs were named SL6-1F2:3 and SW6-1F2:3).The peaks of QTLs SL6-1F2 and SL6-1F2:3 were collected from the same marker, and the peaks of QTLs SW6-1F2 and SW6-1F2:3 were also collected from the same marker. The major QTL SL6-1 mapped with 100 F 2 individuals had an overlapping region with SL6-1F2 mapped with additional 150 F 2 individuals, indicating that this region had high LOD scores and PVE values in different individuals from the same year. On the other hand, the QTL SL6-1F2:3 mapped with 150 F 2:3 individuals had a low LOD value, which might have been influenced by the different growing environments in the different years. Similar conditions were found in SW QTLs. The overlapped region of SL6-1 and SL6-1F2 (from 2.52 Mb to 2.92 Mb) was the SL-associated region, and the overlapped region of SW6-1 and SW6-1F2 (from 1.82 Mb to1.93 Mb) was the SWassociated region. Closely linked CAPS markers (M2374213, M3507675, M1468248, and M1929605) could be used for MASbased breeding in C. maxima for seed elongation and the radius. With regard to the available SLAF-seq data, crossed parents with differences in multiple phenotypes were used in our study, and the QTLs controlling important agronomic traits were mapped. Only 100 F 2 individuals were used to construct the SLAF library and map complex seed-related traits in C. maxima, and accurate QTL mapping results were obtained. Therefore, SLAF-seq is a useful gene mapping technology for small populations and exhibits a high success rate and stability. The early growth of endosperm is coordinated with the seed coat growth and plays a direct role in determining the ripened seed size (Garcia et al., 2005;Ingram, 2010). The seed size of Arabidopsis is regulated by the IKU-MINI pathway, which is thought to control endosperm growth. The IKU-MINI pathway includes IKU1, which encodes a protein containing a VQ motif; and MINI3, which encodes a WRKY transcription factor (Wang et al., 2010). IKU2 also encodes an LRR kinase and shows a recessive mode of action causing reduction in endosperm growth accompanied by precocious cellularization and reduced seed size in Arabidopsis (Luo et al., 2005). In the SL-associated region, we found CmaCh06G005530.1, which encodes a protein containing a VQ motif. We found one amino acid difference in CmaCh06G005530.1 between the two parental lines in a conserved sequence ( Figure S7B). In the SW-associated region, CmaCh06G004140, encoding an F-box and LRR domaincontaining protein, had several amino acid differences between the two parental lines in a conserved domain of SMC, which controls the cell cycle control and cell division ( Figure S7C). These findings suggested that CmaCh06G005530.1 and CmaCh06G004140 were the candidate genes responsible for seed length and width, and the seed size of C. maxima was also regulated by the IKU1 and IKU2 pathway. One gene, GW2, encoding a protein with E3 ubiquitin ligase activity, was found to affect the grain size, weight and yield by controlling the endosperm size in rice (Song et al., 2007). Loss of GW2 function could increase cell numbers, resulting in increased spikelet hull size and enhanced grain size. In the SL-associated region, CmaCh06G005450 was selected, which encodes E3 ubiquitin-protein ligase. One amino acid difference in CmaCh06G005450 between the two parental lines was observed in a conserved sequence ( Figure S7A). This finding indicated that the CmaCh06G005450 gene might affect seed size, which is thought to control cell numbers in our Cucurbita materials. The SNP information from SLAF-seq was limited compared to the whole-genome resequencing results, and some regions within the QTL interval were not sequenced. Coding genes without nonsynonymous SNPs in major QTL regions were also considered candidate genes. CAPS markers M2374213, M3507675, M1468248, and M1929605 were used for single segment substitution line (SSSL) construction. Based on it, further fine mapping of SL and SW genes and gene expression during the seed development stage will be performed in our laboratory.

CONCLUSION
By using 100 F 2 populations from two morphologically diverse parents, a high-density genetic map of C. maxima was constructed by SLAF-seq. With this map, 10 QTLs for seedrelated traits were identified, and major QTLs for SW, SL, and HSW were detected in C. maxima for the first time. An additional 150 F 2 and F 2:3 populations were used to map SW and SL with CAPS markers, and the results indicated that the QTL mapping using SLAF-seq was accurate. Nonsynonymous SNPs and gene descriptions of the SL-and SW-associated regions suggested that the genes encoding a VQ motif gene, E3 ubiquitin-protein ligase and an F-box and LRR domain-containing protein might be associated with SW and SL in C. maxima. In summary, the highdensity genetic map studied herein is a valuable tool for association mapping of important agronomic traits, map-based gene cloning and MAS breeding in Cucurbita.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the the National Center for Biotechnology Information (NCBI), accession number PRJNA549786.

AUTHOR CONTRIBUTIONS
YW performed data analysis, preparing the manuscript. CW contributed to collecting phenotypic characteristics and DNA extraction. HH contributed to CAPS demonstration test. WX and ZW contributed to growing plants. YL and CY contributed to providing experimental material. SQ, the corresponding author, oversaw all activities related to the project implementation and manuscript development.All authors read and approved the final version of the manuscript.