Identification of Candidate Genes Regulating the Seed Coat Color Trait in Sesame (Sesamum indicum L.) Using an Integrated Approach of QTL Mapping and Transcriptome Analysis

Seed coat color is an important seed quality trait in sesame. However, the genetic mechanism of seed coat color variation remains elusive in sesame. We conducted a QTL mapping of the seed coat color trait in sesame using an F2 mapping population. With the aid of the newly constructed superdense genetic linkage map comprised of 22,375 bins distributed in 13 linkage groups (LGs), 17 QTLs of the three indices (i.e., L, a, and b values) of seed coat color were detected in seven intervals on four LGs, with a phenotype variance explanation rate of 4.46–41.53%. A new QTL qSCa6.1 on LG 6 and a QTL hotspot containing at least four QTLs on LG 9 were further identified. Variants screening of the target intervals showed that there were 84 genes which possessed the variants that were high-impact and co-segregating with the seed coat color trait. Meanwhile, we performed the transcriptome comparison of the developing seeds of a white- and a black-seeded variety, and found that the differentially expressed genes were significantly enriched in 37 pathways, including three pigment biosynthesis related pathways. Integration of variants screening and transcriptome comparison results suggested that 28 candidate genes probably participated in the regulation of the seed coat color in sesame; of which, 10 genes had been proved or suggested to be involved in pigments biosynthesis or accumulation during seed formation. The findings gave the basis for the mechanism of seed coat color regulation in sesame, and exhibited the effects of the integrated approach of genome resequencing and transcriptome analysis on the genetics analysis of the complex traits.


INTRODUCTION
Sesame (Sesamum indicum L.) is a specific and important oilseed crop with a long cultivation history (Bedigian, 2003). At present, sesame is mainly grown in Asia, Africa, and Latin America. Sesame seeds possess abundant nutritional substances, such as unsaturated fatty acids, proteins, digestible fiber, and beneficial antioxidants (such as lignans) (Namiki, 2007;Kumar et al., 2013;Ha et al., 2017;Keskin, 2019). For sesame seeds, seed coat color varies from white, shallow yellow, yellow, golden, brown, gray, reddish brown, and other medium colors, to black. In particular, the white-and the black-seeded seeds are more popular for consumption in the world. Black sesame seeds are shown to possess abundant nutritional facts (Ganesan and Xu, 2017;Khoo et al., 2017). Some studies showed that the contents of some substances, such as phenolics, vitamins, and phloretin, in black sesame seeds are high (Ha et al., 2017;Wang et al., 2018). In China, black sesame seeds have also been used as a traditional Chinese medicine in the last 3,000 years Zhang H. et al., 2019).
In sesame, seed coat color trait is considered as a quantitative trait .  firstly performed the genetic background and QTL mapping analysis of sesame seed coat color with an F 2 hybridization population using simple sequence repeat (SSR), amplified fragment length polymorphism (AFLP), and random selective amplification of microsatellite polymorphic loci (RSAMPL) markers. Four QTLs were located on three LGs with the phenotypic variation from 9.6 to 39.95% . Subsequently,  investigated the seed coat color indices of the L, a, and b values of a recombinant inbred lines (RIL) population and determined four QTLs on a genetic map constructed by restriction-site associated DNA sequencing (RAD-Seq). The four QTLs were found to be on three LGs and with the phenotypic variation from 3 to 46%. Wei et al. (2015) performed a genome-wide association study (GWAS) for the seed coat color and some other traits using a panel of 705 sesame accessions and determined seven loci associated with the seed coat color. Meanwhile, Du et al. (2019) detected seven QTLs related with seed coat color on three LGs using specific length amplified fragment sequencing (SLAF-seq) and an F 2 population. Wang et al. (2020) identified 20 candidate genes associated with pigment biosynthesis using transcriptome data from a black and white sesame variety. However, only the SiPPO (polyphenol oxidase) gene was determined for the seed coat color in sesame until now (Wei et al., 2015).
For angiosperm plants, the seed coat color trait is predominantly determined by the content of various pigments, which mainly belong to the secondary flavonoid metabolites, such as flavonols, anthocyanins, and proanthocyanidins (PAs), and are synthesized through the flavonoid biosynthesis pathway (Winkel-Shirley, 2001). At present, dozens of genes regulating the seed coat color regulation have been detected in crops; of which, most genes encode the enzymes related to flavonoid biosynthesis (such as C2 in maize, and TT6 and TT7 in Arabidopsis), transfer proteins (ZmMRP3 in maize and TT12 and AHA10 in Arabidopsis), or regulatory factors (TT1 and TT10 in Arabidopsis) (Pourcel et al., 2005;Marinova et al., 2007;Owens et al., 2008;Badone et al., 2010;Han et al., 2010;Appelhagen et al., 2011Appelhagen et al., , 2015Eloy et al., 2017). Flavonoid biosynthesis pathway is highly conserved in different plant species (Dong et al., 2001;Tohge et al., 2017). To date, numerous transcription factors, such as MYBs, basic helix-loop-helix (bHLHs), Transparent Testa Glabra1 (TTG1), and MYB-bHLH-WDR (MBW) ternary complexes, have been found to be participating in the regulation of flavonoid biosynthesis pathway in a tissue-specific manner (Gonzalez et al., 2009;Xu et al., 2014). In addition, post-translation modification and microRNA have also been proven to be related to the flavonoid biosynthesis pathway (Wang H. et al., 2016).
In this study, in order to identify the major effect of genes underlying sesame seed coat color, we performed a QTL mapping for the sesame seed coat color trait based on a superdense genetic linkage map constructed using an F 2 population and genome resequencing technology. The transcriptome comparison analysis of a black-and white-seeded variety during seed development was also carried out. As a result, a total of 28 candidate genes were determined for the seed coat color trait in sesame according to the integration of QTL mapping and transcriptome analysis. The findings provide a deep understanding of the genetic mechanism underlying the seed coat color regulation in sesame.

Materials and Data
Two varieties, Yuzhi DS899 (white-seeded, mutated from a sibling of Yuzhi 11) and JS012 (black-seeded), and their 120 F 2 individuals were used for the QTL mapping in this study . The F 2 population and both parents were grown in Sanya (109 • 50 E and 18 • 25 N), Hainan, China in 2014 for seed coat color trait evaluation. For the transcriptome analysis, two varieties with a contrasting seed coat color, i.e., Yuzhi 11 and the black-seeded parent JS012, were cultivated in Yuanyang (113 • 97 E and 35 • 05 N), China in 2020. Twenty capsules from 10 plantlets of each variety at the time points 5, 10, 15, 20, and 25 DAF, respectively, were collected. The developing seeds were then peeled from the capsules, frozen in liquid nitrogen, and stored at -80 • C for RNA extraction. Three biological replicates were set for RNA sequencing. All the materials above were chosen from the sesame germplasm reservoir of Henan Sesame Research Center, Henan Academy of Agricultural Sciences (HSRC, HAAS).
The genome resequencing data (reserved in NCBI Bioproject PRJNA315474) of Yuzhi DS899 and JS012 and their 120 F 2 individuals  were used for the genetic map construction and QTL mapping in this study. The sesame genome of var. Yuzhi11 (white-seeded; version 2)  was used as the reference genome for genotyping, variant digging, and candidate gene screening analysis.

Seed Coat Color Trait Evaluation
Seed coat color of the F 2 family was investigated based on a single plant. For each of the sequenced 120 F 2 individuals above, the seeds were harvested after the plant was totally matured. The seeds from each individual were randomly divided into three equal pieces (used as three replicates) after the unmatured seeds were discarded. As to evaluate the seed coat color, the seeds were put into a quartz cell, and then scanned under the Colorflex EZ spectrophotometer (Hunter Associates Laboratory, United States). Three indices (i.e., L, a, and b values) were measured and recorded. Of which, "L" defines the luminosity of the seed coat, with a range from -100 for black to 100 for white; "a" and "b" indicate the shade of color pairs, with a range from -60 for green to +60 for red, and from -60 for blue to +60 for yellow, respectively. For each of the indices, the mean value of the three replicates was finally used.

Construction of a Superdense Genetic Linkage Map for Sesame
Sequencing reads of the parents Yuzhi DS899 and JS012 and their 120 F 2 individuals  were firstly filtered using the Trimmomatic 0.33 software (Bolger et al., 2014), and then mapped to the reference sesame genome (cv. Yuzhi 11)  using the Burrows Wheeler Aligner (BWA) 0.7.17 software with the default settings (Li and Durbin, 2009). Joint variant calling was performed using the GATK 4.0 program following its best practice (Van der Auwera et al., 2013). Genotype coding was performed using the GenosToABH plugin in TASSEL 5.2.43 (Glaubitz et al., 2014). The homozygous genotype of the parent Yuzhi DS899 was coded as "A, " the parent JS012 as "B, " and the heterozygous genotype of their hybrid as "H." Since several consecutive heterozygotes ("H") would be called as identical homozygotes ("A" or "B") within a very short distance due to the low reads coverage, the coded genotype was processed according to Miao et al. (2018) using the following procedures to improve the accuracy of linkage map construction: (1) combining the consecutive and redundant (with identical information for all individuals in the progeny) markers within a short region into a representative marker, and the minimum length between two continuous markers was set to 100 bp (base pair); (2) correcting the genotype of representative markers based on a sliding window of 15 markers; (3) based on the corrected markers, grouping the adjacent and redundant markers into bins; and (4) filtering out the bins of distorted segregation with the number of plants with "A" or "B" genotypes <20, and the number of plants with H genotypes <40. The threshold for distorted segregation loci was selected based on a pilot study of linkage map construction, in which we found that at least 80 individuals (with ideal segregation ratio 1:2:1 for genotype "A, " "B, " and "H") were needed to form 13 linkage groups (corresponding 13 chromosomes of sesame).
The genetic map was constructed based on the filtered genotype matrix using the Lep-MAP3 program with the default parameters and a logarithm of the odds (LOD) of 12.0 to separate chromosomes (Rastas, 2017). LGs with less than 100 markers were discarded. The Kosambi function was applied to transfer the recombinant rate into the genetic distance (Morgan).

QTL Mapping and Comparison
QTL mapping was performed using the QTL Cartographer 2.5 , QTL.gCIMapping.GUI v1.0 Zhang Y. W. et al., 2019), and QTL IciMapping 4.2.53 (Meng et al., 2015), respectively. The Composite interval mapping (CIM) method in the QTL Cartographer, the Genome-wide composite interval mapping (GCIM) method in the QTL.gCIMapping.GUI, and the Inclusive composite interval mapping (ICIM) and Interval mapping (IM) methods in the QTL IciMapping were individually used to detect the QTLs and to estimate the effects. For the CIM method, the map was scanned in 2 cM (centimorgan) intervals with a window size of 10 cM, and the LOD was determined by running 1,000 permutation tests. For the GCIM method, random model, critical LOD score = 2.5, and walk step for genome wide scanning = 1 cM were used. For the ICIM and IM methods, LOD was determined by running 1,000 permutation tests at type I error equals 0.05, and walk step was set at 1 cM. The ratio of phenotypic variance explained by genotype was taken from the marker at the peak QTL position. QTL physical position was determined by the flanking markers.
To make a distinction between the new QTLs identified in this study and from the reported results, QTL position comparison was performed with flanking sequences using the MUMER 4 program (Marcais et al., 2018). The flanking sequences of the QTLs or the associated loci were applied as the queries for alignment against the sesame reference genome (cv. Yuzhi 11) .

RNA-Seq and Analysis
Total RNA of each sample was extracted using the RNAiso Plus Reagents (TaKaRa, Dalian, China). Genomic DNA was removed by DNase I treatment. RNA-seq libraries were constructed and sequenced on the Illumina platform HiSeq 2500 (Illumina, Inc., San Diego, CA, United States). The clean reads were mapped to the reference sesame genome using the STAR 2.7.0 (Dobin et al., 2013). The reads mapped to each genomic feature were counted using the featureCounts 2.0.1 . Default parameters were used for both read mapping with STAR and read counting with featureCounts. Differentially expressed genes (DEG) were screened and analyzed using the DESeq 2 software with the default parameters (Love et al., 2014). False discovery rate (FDR) adjusted p-values < 0.05 and absolute value of log 2 (fold change) > 1 were set as the cutoff for the DEGs.

Gene Function Annotation, Homologous Gene Screening, and Variant Effect Prediction
Gene function annotation was performed using the eggNOG-Mapper 5.0.1 (Huerta-Cepas et al., 2017). Gene ontology (GO) plotting was conducted in the WEGO 2.0 (Ye et al., 2018). Homologous gene screening in the target intervals was conducted using the BLAST + 2.2.31 with the E-value < 1e-10. The effect of gene variant was estimated using the SnpEff 4.1 (Cingolani et al., 2012), and the variants that caused frame shift, stop gain, stop lost, start lost, splice acceptor, splice donor, exon loss, frame-shift, or nonsense mutation were regarded as high-impact and screened according to Cingolani et al. (2012). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed using the R package clusterProfiler 3.12 with the default parameters (Yu et al., 2012), and a q-value < 0.05 was set as the cutoff.

Seed Coat Color Genetic Analysis of the Mapping Population
In order to clarify the genetic background of the seed coat color in sesame, we firstly investigated the phenotypic variation of the seed coat color trait of the F 2 population in 2016 (Supplementary Table 1 and Supplementary Figure 1). L, a, and b values of the white-seeded parent Yuzhi DS899 were 62.81, 5.50, and 19.62, respectively, while those of the black seeded parent were 21.62, 6.73, and 7.27, respectively. All the F 1 seeds were black, whereas F 2 plants exhibited an obvious variation in seed coat color with various values of L, a, and b (Supplementary Figure 1). As to the 120 F 2 plants, "L" values ranged from 16.23 to 62.81, with a standard deviation of 8.40; the "a" values ranged from 1.20 to 9.34, with a standard deviation of 2.67; while "b" varied from 1.42 to 19.62, with a standard deviation of 4.27 (Supplementary Table 1). In particular, we found some F 2 plants that presented a darker seed coat color than the black parent (Supplementary Figure 1). However, the color of all the F 2 plantlets was darker than the white-seeded parent.
We further performed the pairwise correlation analysis of the three indicators of the seed coat color in the population (Supplementary Table 1). The results showed that the L, a, and b values were significantly correlated with each other. The L, a, and b values of the F 2 population were all out of the normal distribution (Figure 1 and Supplementary Table 1), indicating the existences of a few major QTLs controlling the three indicators.

Construction of a Superdense Genetic Linkage Map
Before determining the QTLs for the seed coat color trait, we firstly constructed a superdense genetic map using the F 2 population above and the genome resequencing data. Based on the reference genome (var. Yuzhi 11), a total of 1,264,646 SNP/InDel variants for the two parents and their 120 F 2 plants were identified (Supplementary Table 2). The number of unique variants for the two parents Yuzhi DS899 and JS012 was 122,361 and 528,671, respectively. After filtered, a total of 425,011 variants were successfully genotyped into "A, " "B, " or "H" types in the population (Supplementary Table 2).
Then, we constructed the matrix consisting of the above 425,011 genotype codes for the 120 F 2 individuals and fed it to the linkage map construction pipeline ( Table 1). As the bins with the distorted segregation (the numbers of plants having A/B genotypes < 20 and H genotypes < 40) were excluded, a total of 22,375 bins containing 380,544 SNP/InDel markers were obtained for the genetic map construction (Table 1 and  Supplementary Table 3). As a result, a genetic map with 13 LGs was successfully constructed (Figure 2). The total length of the genetic map was 1,576.14 cM, and the length per LG ranged from 96.96 to 157.96 cM. The average marker interval was 0.82 cM per bin marker. The marker density was 14.20 bins or 241.44 SNP/InDel markers per cM. In addition, 34 gaps with the length ≥ 5.0 cM remained in the genetic map, with the largest gap size of 18.27 cM ( Table 1).

QTL Mapping of the Seed Coat Color Trait in the F 2 Population
We then performed a QTL mapping of the seed coat color in the crossing population using four methods (GCIM, ICIM, IM, and CIM) ( Table 2 and Supplementary Table 4). The results from the four methods were similar, and QTLs were found on LG3, 5, 6, and 9 for the three seed coat color indicators L, a, and b ( Table 2 and Supplementary Table 4). Of these, the QTL intervals observed using the CIM method were usually wider, while the intervals from the GCIM method were narrower. The overlapping relationship and boundaries between the QTLs were obvious (Supplementary Table 4). Therefore, based on the QTL interval boundaries, we summarized the QTL mapping resulting into 17 QTLs, including 5, 4, and 8 QTLs for L, a, and b, respectively ( Table 2). The genetic effect (the explanation rate of phenotype variance or V G /V P value) of the QTLs on the seed coat color trait ranged from 4.66% (for qSCb5.2 estimated by GCIM) to 41.53% (for qSCL9.3 estimated by ICIM). 1 Only the marker interval with the size of >5.0 cM was considered as a gap.
FIGURE 2 | The superdense genetic map constructed using an improved pipeline and the F 2 population derived from a cross between Yuzhi DS899 and JS012.
Due to the overlapping positions of the QTLs controlling L, a, and b, a total of seven non-overlapping intervals, with one on LG3 (43.75-47.09 cM), one on LG5 (20.44-22.11 cM), one on LG6 (78.80-84.23 cM), and four on , were identified for the sesame seed coat color in this study ( Table 2). As the QTLs intervals on LG9 were adjacent to each other, we thus named the QTL hotspot "qSC cluster" for the seed coat color trait. By further merging this qSC cluster into one interval, a total of four intervals (hereinafter referred to as "qSC intervals") on four LGs were thus harvested for the seed coat color trait in sesame, i.e., and LG9: 9.17-30.40 cM ( Table 2).

Genes in the qSC Intervals
We further screened the genes in the four qSC intervals according to the fine sesame reference genome ( Table 3). The results showed that the above qSC intervals were located into six contigs (i.e., C34, C_2_7, C4, U0167, C3, and C_2_3) in the reference genome. The physical size of the six contig segments varied from 26.80 Kb (kilobases) (in contig U0167) to 2,090.72 Kb (in contig C_2_3). A total of 625 genes were detected in the six contig segments, and the number of genes in each of the contig segments varied from one on contig C4 to 235 on C_2_3 (Table 3 and  Supplementary Table 5).
In order to determine the candidate genes, we firstly compared the two parents and determined that a total of 10,719 SNP/InDel  variants existed in the six target contig segments between the parents (Table 3 and Supplementary Table 6). Of which, 9,393 variants were unique in JS012 (black-seeded) compared with Yuzhi DS899 (white-seeded) and Yuzhi 11 (the variety used for reference genome sequencing, white-seeded), and thus, cosegregated with the seed coat color trait. Annotation results of SnpEff indicated that 393 variants in 84 genes caused frame shift, stop gain, stop lost, start lost, splice acceptor, splice donor, exon loss, frame-shift, or nonsense mutation, and were named as high-impact variants ( Table 3 and Supplementary Table 6).

DEGs of White-and Black-Seeded Varieties
To further explore the characteristics of the genes related with the seed coat color trait, we compared the expression profiles of the developing seeds in a white variety (Yuzhi 11) and a black variety (JS012) (Supplementary Figure 2). GO annotation showed that the DEGs from the five stages presented a similar functional profile (Supplementary Figure 3). Most DEGs were involved in metabolic and cellular biological processes and the molecular functions of catalytic activity and binding. We further performed the KEGG enrichment analysis and found that the DEGs were significantly enriched in 37 pathways (q_value < 0.05), including pigment biosynthesis related phenylpropanoid biosynthesis (KEGG: ko00940, at the stages 10, 15, 20, and 25 DAF), flavonoid biosynthesis (KEGG: ko00941, at the stages 10, 15, and 25 DAF), and anthocyanin Biosynthesis (KEGG: ko00942, at the stage 10 DAF) pathways (Supplementary Table 9 and Figure 3). Notably, the three pigment biosynthesis related pathways above were enriched after the stage 5 DAF, and no genes in the three pathways were located in our qSC intervals (Supplementary Table 9).

Seed Coat Color Candidate Genes Identification in White-and Black-Seeded Varieties
In order to screen the candidate genes related with the seed coat color regulation in sesame, we compared the expression profiles of the 625 genes in the qSC intervals during the seed development in the two varieties (Supplementary Table 10). Of the 625 genes, 62, 138, 101, 83, and 135 were differentially expressed at the stages 5, 10, 15, 20, and 25 DAF, respectively (Supplementary Table 10). Considering that the pigment was accumulated in JS012 gradually from 8 DAF and that no enrichment of the seed coat color formation related pathways were observed at the stage 5 DAF, we took the stage 5 DAF as the control and the other stages as the case to search for the candidate genes, and a Venn diagram consisting of the genes that were DEGs at the stages 10, 15, 20, and 25 DAF and not-DEGs at the stage 5 DAF was created ( Figure 4A). We found 223 genes (the sum of the red numbers in Figure 4) that were DEGs in at least at one of the stages 10, 15, 20, and 25 DAF but not at 5 DAF. By selecting the intersection of the 223 genes and the 84 genes containing high-impacts variants (and co-segregated with the seed coat color trait; Supplementary  Table 6), 28 genes were identified ( Figure 4B).
The 28 genes were selected as the highly-potential candidate genes for the sesame seed coat color. Of the 28 candidates, seven were located on LG3, 2 on LG5, and 19 on LG9, while no genes were found on LG6 ( Table 4). The 28 genes had various  Table 4). Of the 28 genes, 14 were expressed predominantly in JS012 (Supplementary Figure 4A), 11 expressed predominantly in Yuzhi 11 (Supplementary Figure 4B), and the other three genes presented the hybridized expression pattern (Supplementary Figure 4C).

The Multiple-Fold Improvement of the Sesame Genetic Linkage Map
In this study, a superdense genetic linkage map was constructed for sesame by genome resequencing and an updated map construction pipeline. Compared with the published genetic maps (Zhang Y. et al., 2013;Zhang et al., 2018;Wu et al., 2014;Uncu et al., 2016;Mei et al., 2017;Du et al., 2019;Asekova et al., 2021;Liang et al., 2021), the new genetic linkage map contains the highest number of both SNP/InDel markers (380,544) and bins (22,375) for sesame to date.
During the genetic map construction, the genotype data which resulted from genome resequencing usually have many noisy genotypes and always lead to inaccurate co-segregating marker (bins) grouping. To overcome the problem above, we developed a genetic map construction pipeline in 2016 and updated the method in this study. The previous pipeline basically consists of five steps: (1) calling variants individually; (2) choosing the variant loci that were called in either parent and in at least 20 progenies; (3) filtering out the loci that were heterozygous in either parent or that have a read depth under five in any progeny; (4) genotype coding; and (5) genotype correction . In this study, the pipeline was improved, and the genotype matrix for map construction was prepared as follows: (1) calling variants jointly; (2) performing genotype coding; (3) combining the consecutive same markers within a short region; (4) genotype correction; and (5) filtering out the loci of distorted segregation. In the updated pipeline, joint variant calling was applied with a higher efficiency than the previous single-sample variant calling and loci filtration strategy. Joint variant calling shared the information across all samples in a cohort and therefore had greater sensitivity, particularly for the site with a low sequence coverage or misalignment (Van der Auwera et al., 2013). As to the loci filtration strategy, in order to "rescue" the deleted but informative sites, we just filtered out the loci with distorted segregation and omitted the filtration step before genotype correction. Thus, more makers and bins could be applied for map construction. In the study, the located SNP/InDel makers increased from 30,193 to 380,544 in the genetic map, with more than an 11-fold increase. The number of bins in the new map increased from 3,041 to 22,375, with more than a six-fold increase. Furthermore, to improve the accuracy of the pipeline, we had two times of merging involved in the pipeline, i.e., combining the consecutive and redundant markers within a short region into a representative marker, and grouping the adjacent and redundant markers into bins. The involvement of merging twice lead to an average of 17 markers per bin in the new map, and, thus, the new map should be of a greater accuracy.

The Superdense Genetic Map Had More Robustly Dissected the Genetic Mechanism Underlying the Seed Coat Color Trait
Seed coat color is one of the earliest traits studied in crops for genetics analysis and Mendel's law discovery. In sesame, seed coat color seems to be correlated with the species evolution and the beneficial effects on seed nutrition (Namiki, 2007;Zhang et al., 2012). Based on our investigation, the sesame seed In these references, the corresponding genes had been proved or suggested to be involved in pigments biosynthesis or accumulation during seed formation.
coat color is stable under various environments, and is mainly influenced by the seed ripening level. With the aid of a molecular genetic map,  detected four QTLs for the seed coat color on three LGs using an F 2 population derived from a cross between a white-seeded parent and a black-seeded parent. Wei et al. (2015) identified seven loci on three LGs for the trait by GWAS using a natural population.  identified four QTLs on three LGs for the trait using a population consisting of 430 RILs. Lately, Du et al. (2019) found seven seed coat color related QTLs on three LGs. The results suggest that the seed coat color trait is controlled by a few major QTLs in sesame, and our phenotype analysis also agrees with this. In the present study, 17 QTLs in seven non-overlapping intervals on four LGs were determined for sesame coat color. Comparison of the results indicated that our QTLs covered all the previous published QTLs, except two loci, i.e., qSCL-8.1 identified by  and qsccZ12 identified by Du et al. (2019). High consistency reflects the reliability of the QTL mapping results in this study. Furthermore, we found that the specific QTL qSCL-8.1 reported by  was close to the qSC cluster in this study. We proposed that qSCL-8.1 should be another member of the cluster. The QTL qsccZ12 was identified based on the color space index Z, which is different from the indices used in this study, and had the lowest phenotypic variation (5.58%) among the seven QTLs reported by Du et al. (2019). This may be the reason why we missed it in this study. Comparing with these previous studies, we (1) identified a new QTL for the seed coat color, i.e., qSCa6.1, and (2) successfully distinguished more QTLs than the previous studies.

Candidate Genes for the Seed Coat Color in Sesame
In this study, in order to clarify the expression profiles of the candidate genes above, we also analyzed and compared the seed transcriptome during the seed development in the white-and black-seeded varieties. Compared to the stage 5 DAF, the DEGs at the following developing stages seem more important for the sesame seed coat color formation. Previous investigation indicated that pigment accumulation was usually observed approximately from 8 DAF in the black variety JS012 (data not shown). In this study, we determined that the pigment formation related pathways were not enriched at the time point 5 DAF. We thus used the DEGs at the stage 5 DAF to filter the other stages for candidate gene identification. Similarly, Wang et al. (2020) did a detailed research using RNA-seq data from a black-and white-sesame, and proposed that the genes controlling the sesame seed coat color were mainly expressed after 5 DAF, and thus took the 5 DAF as the initiated control and grouped 11-20 DAF to identify the candidate genes linked with pigment biosynthesis. By integrating the QTL mapping and transcriptome analysis, we determined 28 genes as the candidate genes for the seed coat color in sesame (Table 4). Of the 28 candidate genes, the gene SiPPO (C_2_7.207) has been proved to determine the seed coat color in sesame (Wei et al., 2015), while the genes C_2_3.819 and C34.285 were annotated as transcription factor MYB-like and bHLH-like, respectively (Table 4). MYB and bHLH domain proteins act as a key regulator in PA accumulation in a developing seed (Nesi et al., 2000(Nesi et al., , 2001. Meanwhile, the gene C34.291 was annotated to encode a pyridoxine/pyridoxamine 5 -phosphate oxidase (PPOX) gene, which was suggested to be related to pigment biosynthesis in both plant (Sang et al., 2011) and human (Frank et al., 1999). The gene C_2_7.220 encodes a dioxygenase gene that was proven to be involved in plant pigments biosynthesis (Polturak and Aharoni, 2018). The gene C_2_3.683 encodes a xyloglucan endotransglucosylase/hydrolase 5 (XTH5) genes, and XTH5 has been reported to be involved in fruit pigment accumulation in the ripening processes in tomato (Wang et al., 2019). In addition, we found that the genes C_2_3.773 and C_2_3.787 were annotated as E3 ubiquitinprotein ligases, which were suggested to play important roles in tomato pigment accumulation (Tang et al., 2015). The genes C_2_3.774 and C_2_3.850 encode triose phosphate/phosphate translocator and cytochrome P450 gene, respectively, and were also suggested to be related to pigments biosynthesis in many plants (Tanaka and Brugliera, 2013;Hilgers et al., 2018). To sum up, 10 out of the 28 candidate genes were proved or suggested to be involved in pigments biosynthesis or accumulation. The results exhibited the high confidence of the candidate gene detection. However, we should remember that some SNP and InDels existed in the upstream and downstream sequences of the genes that have been omitted in the study. In the future, the function of the candidate genes and other genes in the target intervals should be further validated to clarify the regulation for the seed coat color trait in sesame.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, PRJNA315474 and PRJNA742439.

AUTHOR CONTRIBUTIONS
CL performed the data analysis and drafted the manuscript. YD conducted the main experiments and participated in the data analysis. HM guided the experiments and drafted the manuscript. MJ conducted the cross-section. LW performed the genetic experiments. HZ conceived and designed the experiments, and guided manuscript correction for publishing. All authors have seen and approved the final version of the manuscript. Supplementary Table 1 | Statistics of the L, a, and b values of the seed coat color trait in the F 2 population derived from the cross between Yuzhi DS899 and JS012.