Construction of a High-Density Genetic Map and Identification of Loci Related to Hollow Stem Trait in Broccoli (Brassic oleracea L. italica)

A high-quality genetic map is important for mapping of compound traits. In this study, a genetic map was constructed based on the reference genome TO1000 after specific locus amplified fragment (SLAF) sequencing in a double-haploid segregation population of broccoli, and loci controlling hollow stem trait were identified in the genetic map. The genetic map contains 4,787 SLAF markers, with a mean marker distance of 0.22 cM and the mean sequencing depths of 91.14-fold in the maternal line, 88.97-fold in the paternal line and 17.11-fold in each DH progeny. A locus controlling the hollow stem trait, QHS.C09-2, which could explain 14.1% of the phenotypic variation, was steadily detected on the linkage group nine in the indicated data of 3 years’ trials and BLUE analysis. The genetic map could lay an important foundation for mapping of compound traits, and mapping of hollow stem trait would be basis to clone the genes related to hollow stems in broccoli.


INTRODUCTION
Broccoli (Brassica oleracea var. italica) is a significant vegetable crop, whose flower heads are rich in vitamins, minerals and secondary metabolites such as glucoraphanin (Bjorkman et al., 2011;Natella et al., 2016;Sarvan et al., 2017). During 2014-2016, global production quantity and area harvested of broccoli and cauliflower (B. oleracea L. var. botrytis) were both increasing from ∼24.37 to ∼25.32 million tons, from ∼1.29 to ∼1.34 million hectares, respectively 1 . Most of broccoli commercial cultivars were bred by traditional breeding methods in the past decades. Classic breeding methods have constraints and difficulty to analyze compound traits. Modern breeding technology including molecular marker assisted selection (MAS) has merits in analyzing compound traits. High-density genetic maps are important foundation to analyze complex trait and MAS for breeders.
Broccoli is one varietas of B. oleracea (Cheng et al., 2016). There have been some mapping populations and associated linkage maps construction in B. oleracea, such as cabbage, broccoli, and cauliflower (Walley et al., 2012;Wang W. et al., 2012;Zhao et al., 2016), and genetic loci for some traits like leaf trichome and male sterile in cabbage (Liang et al., 2017;Mei et al., 2017) and purple leaf trait in ornamental kale  were mapped correspondingly. The genomes of B. oleracea var. capitata line 02-12 and kale-like TO1000 were sequenced and assembled successfully (Liu S. et al., 2014;Parkin et al., 2014), which are very important for B. oleracea crops. Their genome sequences have been used as reference to construct genetic map in cauliflower, broccoli and ornamental kale (Zhao et al., 2016;Branham et al., 2017;Liu X. et al., 2017).
Hollow stems often happen in broccoli commercial production. Inside some broccoli stems, pith tissues near cavities would become blown or even black. This kind of broccoli flower head loses attraction to consumers. Therefore the occurrence of hollow stems could cause severe economic losses to producers. A hollow stem was considered as a physiological disorder in the inner pith tissues of broccoli (Boersma et al., 2013). Liu (2009) reported that the trait of hollow stems was controlled by multi-genes. But its related genes or loci are still unknown. In the study, a high-quality genetic map was constructed based on the reference genome TO1000 in a broccoli double-haploid (DH) segregation population and loci controlling hollow stem trait were identified in the genetic map.

Plant Materials and Genomic DNA Extraction
A segregation population with 127 DH lines was achieved by microspore culture of the hybrid which came from a cross between DH lines DH16-2 and DH28-4. The maternal line DH16-2 is big type, and has a heavy flower head, a hollow stem, few branches, small buds, and self compatibility, while the paternal line DH28-4 is compact-type, and has a light flower head, a solid stem, heavy branches, big buds, and strong self incompatibility. The parents and their DH progenies were planted in the Yangdu experimental greenhouse of Zhejiang Academy of Agricultural Sciences and self-pollinated to collect seeds for reserving materials and further phenotype surveying. The phenotype data were from the same individuals assessed three times. In order to avoid the contamination by the pollens from other lines, hands and tools were sterilized with 70 percent ethanol and flowers would be separated by bagging during selfpollination.
Clean young leaves of all offspring and the parent were collected and their whole genomic DNAs were isolated according to the CTAB method. Electrophoresis on 0.8% agarose gel and a spectrophotometer (NanoDrop, United States) were employed to detect the DNA quality and concentration.

Phenotype Survey and Data Statistics
Double-haploid population, the hybrid and their parents were planted in the Yangdu experimental greenhouse in 2014, 2015, and 2017. The overground stem segments were harvested and cut longitudinally in half (Figure 1). Their stem traits were surveyed as hollowness or no-hollowness. There were two replicates per field trial, and there were three plants per replicate. As there were many phenotype data absent due to bad weather in 2014, which affected the QTL analysis using the year's data. When the environment in different years changes a lot, the phenotype data will have a certain errors or biases and cause that no locus could be steadily detected. Best Linear Unbiased Evaluation (BLUE) is a kind of statistical approach Pan et al., 2017), which could solve this problem, as it could give the lowest variance of the estimate linear estimators and the errors do not need to be normal, nor do they need to be independent and identically distributed. In this case, the indicated data of 3 years' trials and BLUE value were employed to analyze QTL. The heritability was calculated as Shen et al. (2018) mentioned.

SLAF Library Construction and High-Throughput Sequencing
An optimized SLAF-seq strategy was utilized in this study. Two enzymes Hpy166II + Hae III (New England Biolabs, NEB, United States) were used to digest the genomic DNA from the parent and population individuals. Then, by Klenow Fragment (3 → 5 exo-) (NEB) and dATP at 37 • C, a single nucleotide (A) overhang was ligated with the digested fragments. Subsequently, the A-tailed fragments were connected to duplex tag-labeled sequencing adapters (PAGE-purified, Life Technologies, United States) by T4 DNA ligase. Polymerase chain reaction (PCR) was carried out using dNTP, diluted digestion-ligation DNA samples, Q5 R High-Fidelity DNA Polymerase and PCR primers (Forward primer: 5 -AATGATACGGCGACCACCGA-3 , reverse primer: 5 -CAAGCAGAAGACGGCATACG-3 ). Next, PCR products were depurated using Agencourt AMPure XP beads (Beckman Coulter, High Wycombe, United Kingdom) and pooled. Pooled samples were separated by 2% agarose gel electrophoresis. Fragments ranging from 364 to 464 base pairs (with indexes and adaptors) in size were excised and purified using a QIAquick gel extraction kit (Qiagen, Hilden, Germany). Gel-purified products were then diluted. And pairend sequencing (Each end 125 bp) was performed on an Illumina

SLAF Markers Identification Based on Reference Genome and Genotyping
Specific locus amplified fragment markers were developed and genotyped using procedures described by Sun et al. (2013). Briefly, low-quality reads (quality score < Q30; indicating 0.1% chance of a base error) were filtered out and raw reads left were sorted to each progeny on the basis of duplex barcode sequences. After the barcodes and the terminal 5-bp positions were deleted from each high-quality reads, clean reads from the same sample were mapped onto the Brassica oleracea TO1000 genome sequence 2 using BWA software (Li and Durbin, 2 http://plants.ensembl.org/Brassica_oleracea 2009). Sequences mapped to the same position were defined as one SLAF locus (Zhang et al., 2015). Then, single nucleotide polymorphism (SNP) loci of each SLAF locus were moved out between parents, and SLAFs with more than three SNPs were discarded firstly. Alleles of each SLAF locus were then defined according to parental reads. For diploid species, one SLAF locus can contain at most four genotypes, so SLAF loci with more than four alleles were considered as repetitive SLAFs and then removed. Only SLAFs with two to four alleles were defined as polymorphic and considered as potential markers. All polymorphism SLAFs loci were genotyped with consistency in the parental and offspring SNP loci. The marker codes of the polymorphic SLAFs were analyzed according to the population type DH, which consist of one segregation types (aa × bb).
In order to further ensure the genotyping quality, genotype scoring was executed by a Bayesian approach (Sun et al., 2013). Firstly, a posteriori conditional probability was computed using the coverage of each allele and the number of SNP. Then, qualified markers were selected according to genotyping quality score translated from the probability. Low-quality markers for each marker and each individual were counted and the worse markers or individuals were deleted during the dynamic process. When the average genotype quality scores of all SLAF markers reached the cutoff value, the process stopped. High-quality SLAF markers for the genetic mapping were filtered by the following criteria. First, these sequences with depth of the parents below eight folds, or more than five SNPs were deleted. Second, markers which cover genotypes less than 60% of all offspring FIGURE 2 | Distribution of the total SLAFs on the reference genome TO1000. The x-axis and the y-axis represent chromosome length and code, respectively. Each yellow bar stands for a chromosome, and deeper color from yellow to black means more SLAFs on the corresponding location. were discarded. Thirdly, the Chi-squared test was performed to examine the segregation distortion. Markers with significant segregation distortion (P > 0.05) were initially excluded from the map construction and were then added later as accessory markers.

Genetic Map Construction
Markers loci were partitioned primarily into LGs by the modified logarithm of odds (MLOD) scores > 5. Markers with MLOD scores < 5 were wiped out prior to ordering. A newly developed HighMap strategy was used to order the SLAF markers and correct genotyping errors within LGs in order to obtain a high-density and high-quality map (Zhang et al., 2015). Firstly, two-point analysis was used to analyze recombinant frequencies and LOD scores. Then, an iterative process of marker ordering was conducted by simulated annealing algorithms, enhanced Gibbs sampling and spatial sampling (Jansen et al., 2001;. At the beginning of the ordering procedure, SLAF markers were chose by spatial sampling. One marker was picked up randomly according to a priority order of test cross, and any marker whose recombination frequency is smaller than a given sampling radius is deleted. Then, simulated annealing was employed to seeking the best map order. Summation of adjacent recombination fractions was obtained referring to Liu S. et al. (2014). When the new map order is rejected in the continuous steps, the annealing system will stop. Multipoint recombination frequencies of the parents were evaluated by blocked Gibbs sampling after the best map order of sample markers were generated . The renewed recombination frequencies were integrated into the map to optimize the map order in the followed cycle of simulated annealing. When a steady map order was obtained, the next map building round would be begun. All the markers were mapped appropriately by the mapping algorithm. Then, the error correction strategy of SMOOTH was disposed according to parental contribution of genotypes (Van Ooijen, 2011), and missing genotypes were imputed by a k-nearest neighbor algorithm (Van Os et al., 2005). Skewed markers were then added into this map by applying a multipoint method of maximum likelihood. Map distances were estimated using the Kosambi mapping function .

QTL Analysis
Detection of QTL for hollow stem trait according to the 3 year data was performed by IciMapping v 4.0 software. The phenotype FIGURE 3 | Distribution of polymorphic SLAFs on the reference genome TO1000. The x-axis and the y-axis represent chromosome length and code, respectively. Each yellow bar stands for a chromosome, and deeper color from yellow to black means more SLAFs on the corresponding location.
Frontiers in Plant Science | www.frontiersin.org values of the 127 DH lines in 3 years were employed for QTL mapping for multi-environmental trials. The method ICIM-EPI was used for detecting the additive and epistatic QTL. The walking speed was 0.1 cM, and P was 0.001 in stepwise regression.
The LOD threshold was set by permutation analysis based on 1000 permutations. BLUE values from 3 years were analyzed by MapQTL 5.0 software. The map was scanned at 1-cM intervals, the maximum FIGURE 4 | The broccoli genetic map based on the reference genome TO1000. The x-and y-axis are for linkage group and genetic distance, respectively. Red line in the map is for partial separation markers, and the black one is for normal markers. LOD with at least greater than two score along the interval was regarded as the location of the QTL, and the area in the LOD score greater than the threshold was regarded as the confidence interval. The LOD score threshold was primitively set at 3.0 for QTL declaration, and a QTL that was greater than this LOD threshold was considered as a potential QTL. If any relevant QTL was recognized, the LOD score threshold was determined using the 1,000-permutation test with a confidence of 0.99. QTLs and LOD scores greater than the threshold at a confidence of 0.99 were declared significant. If no any relevant QTL was identified, the threshold value corresponding to 0.95 of confidence was considered. If still no relevant QTL was identified, 0.90 of confidence was set. If there is no result, the PT test will not be taken into account and the threshold will be lowered to 2.5. If there is no interval for 2.5, the threshold will drop to 2.0.

SLAF Sequencing Raw Data Statistics
In this study, leaves from 127 DH individuals in the segregation population and their parent DH lines were employed to extract DNA and further experiments. By counting the proportion of residual restriction sites in reads inserts, the efficiency of normal digestion was 88.54 percent, which indicated that the digestion efficiency was normal. In total, approximate 81.7 G bp data with 408.7 M reads were produced through the highthroughput sequencing, of which 40.03 percent GC (guaninecytosine) content and 95.02 percent high-quality reads (quality scores > 30) ( Table 1). All the raw data has been submitted to National Center of Biotechnology Information (NCBI), the BioProject ID was PRJNA449775.

SLAFs Development According to the Reference Genome
All clean reads were aligned with the TO1000 reference genome sequence using BWA software. Among them, 75.99% of all clean reads were mapped on the reference genome. Totally, 24.01% of reads weren't mapped. 185,349 high-quality SLAFs distributed throughout the reference genome were developed (Figure 2 and Supplementary Table S1). Those SLAFs were located successfully on the reference genome TO1000. The average sequencing depths of these SLAFs were 53.33-, 56.13-, and 12.87-fold in maternal DH line (DH16-2), paternal DH line (DH28-4) and each progeny of the DH population, respectively ( Table 2). Then, 185,349 SLAFs were classified as polymorphic, non-polymorphic and repetitive types according to polymorphism analysis of the allele numbers and difference   Table S2). Polymorphic SLAFs were distributed even on the genome (Figure 3 and Supplementary Table S1). 29,215 polymorphic SLAFs were genotyped with consistency in the parental and DH offspring. Finally, 20,250 polymorphic SLAFs loci were genotyped successfully.

Map Construction
In order to get a high-quality map, polymorphic SLAFs with lower-quality or more than five SNPs or severe partial separation or covering less 60% separate individuals were filtered. Finally, 4787 high-quality SLAFs were obtained for constructing a genetic map which had the average sequencing depths of 91.14-fold in the maternal line, 88.97-fold in the paternal line and 17.11-fold in each DH progeny ( Table 2). Those markers with MLOD values less than three were filtered out. Then, all 4787 SLAFs were distributed even on nine LGs. Linear arrangements of markers among each linkage group and genetic distances between them were analyzed using Highmap software. Eventually, a genetic map covering the reference genome 442.34 Mb with total genetic distance 798.61 cM was constructed (Figure 4 and Table 3). This map contained 9,367 SNPs, with an average genetic distance 0.22 cM and a max gap 16.45 cM. There were gaps larger than 10 cM on LG7 and LG9. And there were 768 markers showing segregation distorting, 0.02 percent of singleton and 1.75 percent of miss. The average markers integrity of the map reached 90.00% (Figure 5). The markers in the genetic map had high colinearity with the reference genome TO1000 (Figure 6).

QTLs of Hollow Stem Trait in Broccoli
Broccoli stems trait in the DH population was surveyed in 3 years. The stem traits were surveyed as hollowness or no-hollowness in 3 years. The correlation of phenotypic data in the DH group in 3 years was analyzed. The correlation between 2014s data and 2015s data was 0.805, one between 2014s and 2017s was 0.726, one between 2015s and 2017s was 0.817. The heritability of hollow stem trait was about 71% (Supplementary Table S3).

DISCUSSION
High-quality genetic maps are important for identifying genes of complex traits. The genomes of B. oleracea were sequenced and assembled successfully Parkin et al., 2014). In this study, the markers in the genetic map have high collinearity with the reference genome TO1000, which indicated that it was reliable that SLAFs were developed and the genetic map was constructed based on the reference genome. However, 75.99% of all clean reads were mapped on the reference genome and 24.01% reads couldn't be mapped on it.
Probably, because one genotype in a species sequenced with high-resolution couldn't contain all sequences information of this species. The reference genome TO1000 could not cover all sequences of B. oleracea. For example, even in the two elite indica rice varieties' genomes ZS97RS1 and MH63RS1, there were 1,891 non-TE genes present in the ZS97RS1 genome but not sequenced in the MH63RS1 genome, and conversely 2,957 non-TE genes present in the MH63RS1 genome but not sequenced in ZS97RS1; and there were a surprisingly large number of structural variations including inversions, translocations, and presence/absence variations between them .
Broccoli and kale-like line TO1000 are two different varietas in B. oleracea. They have remarkable phenotype differences. Among them, the most important difference is that broccoli has a big flower head and TO1000 hasn't. So, there might be many differences in genome sequences and structural variations between the two varietas, which could cause some reads in broccoli couldn't be mapped on the reference genome.
The genetic map was constructed by a DH segregate group attained by microspore culture. In microspore culture, production of embryos and regenerated plants depend on genotypes to a certain degree in B. oleracea (Gu et al., 2014), which could affect construction of a genetic map. Here, the average marker interval of the map was as short as 0.22 cM, however, there were still two gaps more than 10 cM, which were on the LG7 and LG9, respectively. And there were a certain amount of distorted markers in this genetic map. The reason for them is presumably that microspore culture might select a certain genotypes to produce embryos, which could cause segregation distort and no individuals with recombination event on the two genome intervals in the segregation population.
The heritability of hollow stem trait was about 71%, which showed that although the hollow stem trait was genetic, it was affected by environments. It was consistent with our observation over years and other's finding (Boersma et al., 2013). During the survey of the six generations (maternal and paternal line, hybrid F 1 , F 2 , and two backcross groups), it was observed that high temperature and drought could make hollowness serious, while low temperature and enough water could reduce the hollowness. But in all environments tested, maternal line DH16-2 was always hollow and paternal line DH28-4 was never hollow. These observations and the result in this study could show that the hollowness in broccoli stems was controlled by genes and meanwhile affected by external environment. Besides, hollow stems also occur in other vegetable crops such as tomato and beans (Aloni and Pressman, 1981;Huberman et al., 1993;Carr and Jaffe, 1995). Drought stress and exogenous plant growth regulation such as ethephon would induce stems hollow (Huberman et al., 1993). Hollow stems were found to develop just after inflorescence initiation in broccoli, then stems enlarged rapidly and starch content in the pith decreased (Boersma et al., 2013). Boron deficiency was blamed for the underlying cause of hollow stem in broccoli and cauliflower crops (Shelp and Shattuck, 1987). However, Boersma et al. (2013) didn't observe many symptoms associated with a boron deficiency in broccoli and cauliflower. No consistent results and relative genes on hollow stems in broccoli and cauliflower were found.
Hollow stem trait in broccoli is significant in production and was reported 50 years ago (Boersma et al., 2013). But, this trait in stem is complex, and the relevant physiological and genetic research had rarely been reported, potentially because it was difficult to measure the phenotype of the hollow stem. In this study, the phenotype was simply recorded as hollowness and no-hollowness in the 3 years' evaluation. Accordingly eight loci were identified for the hollow-stem trait. But most of them could explain very low phenotypic variation. Only one locus, QHS.C09-1, could explain higher phenotypic variation, about 15.6 percent. And this locus also partly overlapped with the locus, QHS.C09-2, which was identified using BLUE values. Therefore QHS.C09-2 was presumed to be the major locus related to hollow stem in broccoli.
In this study, though a major locus was detected, there are still many queries which need to be uncovered in the future. For example, cavities in broccoli hollow stems were irregular (Boersma et al., 2013), such as some are short in transection width but long in longitudinal direction, while some the opposite. Is there any possibility that different locus or gene would control, respectively, transection width and longitudinal direction of a cavity. Maybe, the method to measure shape of rice grain, which has been evaluated by grain length (GL), grain width (GW), and grain thickness (GT), could be referred to in future study. Many different QTLs controlling them in rice have been identified and some genes related to GL, GW and GT have been cloned, respectively (Fan et al., 2006;Song et al., 2007;Weng et al., 2008;Li et al., 2011;Wang S.K. et al., 2012;Zhang et al., 2012).

AUTHOR CONTRIBUTIONS
HY and HG designed and performed the experiments. JW surveyed the phenotype. ZZ were involved in the data analysis. XS and YS performed the microspore culture. FB was a cooperator in the European project.