Identification of QTLs and allelic effect controlling lignan content in sesame (Sesamum indicum L.) using QTL-seq approach

Sesame (Sesamum indicum L.), an oilseed crop, is gaining worldwide recognition for its healthy functional ingredients as consumption increases. The content of lignans, known for their antioxidant and anti-inflammatory effects, is a key agronomic trait that determines the industrialization of sesame. However, the study of the genetics and physiology of lignans in sesame is challenging, as they are influenced by multiple genes and environmental factors, therefore, the understanding of gene function and synthetic pathways related to lignan in sesame is still limited. To address these knowledge gaps, we conducted genetic analyses using F7 recombinant inbred line (RIL) populations derived from Goenbaek and Gomazou as low and high lignin content variants, respectively. Using the QTL-seq approach, we identified three loci, qLignan1-1, qLignan6-1, and qLignan11-1, that control lignan content, specifically sesamin and sesamolin. The allelic effect between loci was evaluated using the RIL population. qLignan6-1 had an additive effect that increased lignan content when combined with the other two loci, suggesting that it could be an important factor in gene pyramiding for the development of high-lignan varieties. This study not only highlights the value of sesame lignan, but also provides valuable insights for the development of high-lignan varieties through the use of DNA markers in breeding strategies. Overall, this research contributes to our understanding of the importance of sesame oil and facilitates progress in sesame breeding for improved lignan content.


Introduction
Sesame (Sesamum indicum L.) is a major oil crop and a special crop in East Asia, including Korea, China, and Japan, with a focus on its functional ingredients (Langyan et al., 2022).With an oil content of approximately 50% within its seeds (Uzun et al., 2008;Wang et al., 2014;Wei et al., 2015), sesame is an optimal candidate for industrial oil production and use.Extensive research has highlighted the nutritional advantages and health effects of sesame oil in humans, which include anti-inflammatory and anti-cancer activity, antioxidant properties, and nootropic effects (Yokota et al., 2007;Monteiro et al., 2014;Wan et al., 2015;Xu et al., 2015;Kim et al., 2023).Lignans, predominantly sesame oil, are categorized into fat-soluble and water-soluble types (Dar and Arumugam, 2013).Among these, sesamin and sesamolin are the major components of the fat-soluble aglycons (Dar and Arumugam, 2013;Michailidis et al., 2019), demonstrating the substantial health-promoting potential of sesame both clinically and pharmacologically (Majdalawieh et al., 2017).Given their significance, in-depth exploration of functional elements such as oil and lignans may significantly elevate the agricultural value and industrial potential of sesame.
In recent years, major crop breeding strategies have changed significantly, moving from traditional breeding using physical markers to the concept of marker-assisted selection (MAS) employing contemporary molecular markers (Collard and Mackill, 2008;Nadeem et al., 2018).In addition, the availability of high quality reference genomes and pan-genomes has made a variety of genomic studies possible, such as genetic mapping, structural variant studies of whole genomes, and syntenic analyses that compare regions between and within species (Bayer et al., 2020;Khan et al., 2020;Sun et al., 2022).The progress related to DNA markers based on nextgeneration sequencing (NGS) technology has made it possible to use genome-based markers such as single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels), which are higher in throughput than previous RLFPs, RAPDs, and SSRs.This has led to rapid and accurate breeding as well as detailed research at the gene level, giving rise to a paradigm of digital breeding (Collard and Mackill, 2008;Ray and Satya, 2014;Nadeem et al., 2018;Varshney et al., 2021).
QTL-seq is an analytical technique that detects significant genetic regions associated with a trait by pooling individuals with contrasting phenotypes and comparing their genotypes to identify quantitative trait loci (QTLs) (Takagi et al., 2013).This approach combines bulked-segregant analysis (BSA), which previously used a small number of markers, with whole genome sequencing (WGS) using NGS technology to develop and use a large number of markers at low cost, thus improving the efficiency and accuracy of genetic studies.Unlike traditional QTL analysis in a bi-parental population, where all individuals must be genotyped to detect QTLs, QTL-seq mixes individuals with the same phenotype into groups, assuming that all genetic regions are equally likely to segregate due to random crossover, and only identifies genetic regions that are extremely segregated between groups as statistically significant (Abe et al., 2012;Takagi et al., 2013;Sugihara et al., 2022).Several studies in rice, maize, and tomato have demonstrated that QTL-seq provides comparable reliable results, especially for genotyping, thus overcoming the high cost of conventional QTL and genome wide association study (GWAS) (Chen et al., 2021;Topcu et al., 2021;Yang et al., 2021).Thus, a strategy to find useful trait-associated loci from a large number of materials based on different resources from different environments and ecotypes using low-cost and timesaving methods such as QTL-seq to accelerate breeding is essential.Secondary metabolites, are particularly regarded as typical quantitative features and are known to be influenced by a variety of hereditary variables (Alseekh et al., 2015).
In this study, we aimed to find genetic factors controlling sesame lignan content by using QTL-seq on an F7 recombinant inbred lines (RILs)constructed from individuals with different lignan content.Additionally, we looked at the impact of QTL stacking by comparing lignan content according to the haplotypes in the identified QTLs, which will be used as a basis for breeding the selection of new sesame varieties with improved economic value and functional content.

Plant materials and phenotypic evaluation
A total of 257 F7 populations were derived by crossing lowlignan Goenbaek with high-lignan Gomazou through generational advancement using the single-seed descent method.Goenbaek is an elite variety most widely grown in South Korea for blight resistance and agronomic stability (Kim et al., 2018), while Gomazou is a variety bred in Japan for functionality (Yasumoto and Katsuta, 2006).The parental lines and RILs were planted at the National Institute of Crop Science experimental greenhouse, Miryang, South Korea, in June 2018 at 55 cm × 15 cm spacing; ten individuals of each line were harvested in August, and these planting formats and periods are adapted to the weather conditions in Korea.After harvesting, they were pooled and examined for sesamin and sesamolin content using HPLC to estimate the total lignan content.The sample was extracted in methanol and the analytical conditions were as follows: mobile phase A was 0.1% trifluoroacetic acid in water mobile phase B was 0.1% trifluoroacetic acid in methanol, and analysis under 60% (v/v) methanol conditions.The column was a YMC Triart C-18 (1.9 µm, 2.0 mm × 50 mm, 30 °C; YMC Co., Kyoto, Japan) with a flow rate of 0.3 mL/min and was analyzed under UV at 290 nm using a Dionex 3000 HPLC (Thermo Fisher Scientific, Waltham, MA, United States of America).Sesamin and sesamolin were measured using an analytical standard (Sigma-Aldrich, St. Louis, MO, United States of America).

Construction of bulks, DNA extraction, and sequencing
Based on the distribution of lignan content in the RIL population, ten individuals belonging to each of the two extremes were selected to form high and low bulk.Genomic DNA was extracted from fresh leaves collected at the early stage of growth using the GeneAll ® Exgene ™ Plant SV kit (GeneAll   Biotechnology, Seoul, Korea).The DNA concentration of each individual was uniformly diluted to 10 ng/mol using the Nanodrop 3,000 spectrometer (Thermo Scientific, Wilmington, DE, United States of America) and pooled into high and low bulk by mixing an equal amount of DNAs, respectively.After constructing the pair-end sequencing libraries, Goenbaek (read length: 101 bp), Gomazou (read length: 151 bp), high bulk (read length: 151 bp), and low bulk (read length: 151 bp) were sequenced using the Illumina HiSeq platform.Sequenced data were deposited into the National Center for Biotechnology Information (NCBI) under BioProject (https://www.ncbi.nlm.nih.gov/bioproject) as PRJNA1028140.

Genotyping and QTL-seq analysis
The raw reads were trimmed to remove residual adapter sequences using trimmomatic v0.39 (Bolger et al., 2014).The trimmed reads were aligned to the sesame reference genome (cv.Zhongzhi No.13.v 2.0) (Wang et al., 2016) using BWA v0.7.17 (Li and Durbin, 2009), and BAM files were generated with a Q-score of 30 or higher using SAMtools v.1.15.1 (Li et al., 2009).To select variants for analysis, only positions with a mapping quality ≥30 or more, a mapping depth ≥8 but <250, and where both parental lines were confirmed to be homozygous alleles were filtered from the bulk samples using VCFtools v0.1.16(Danecek et al., 2011).For the QTLseq study, the SNP-index was calculated according to wellestablished principles from previous studies (Abe et al., 2012;Takagi et al., 2013), and the analysis pipeline was applied using VCF files (Sugihara et al., 2022).To reduce the possibility of missing positions and to find major genetic regions, the variant index of both Goenbaek and Gomazou was calculated and compared as consensus genomes.The average SNP-index was calculated through sliding window analysis with a 1 Mb window size and a step size of 50 Kb.Under the null hypothesis of no QTL, the statistical test of △(SNPindex) was performed for all SNP positions based on read depth.The △(SNP-index) was plotted along the physical chromosome position with 99% and 95% confidence intervals.

Variant mining, KEGG pathway mapping, and qRT-PCR analysis
The parent sequence variation and two bulks were analyzed using the SnpEff software to assign their changes and impacts on the function of the gene (Cingolani et al., 2012).For functional annotation of the genes, homologous genes were searched through comparative analysis using the BLASTp v2.10.0+software with default parameters based on Arabidopsis thaliana gene-protein information.The functional descriptions of sesame genes were inferred by referring to A. thaliana homologs, and protein sequences and gene information were downloaded from TAIR version 10 (https://www.arabidopsis.org).The assignment of candidate genes to metabolic pathways was mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg/pathway.html)pathway.To compare gene expression between the Goenbaek and Gomazou varieties, total RNA was extracted from immature seeds at 15 and 30 days after flowering using the GeneAll Ribospin ™ Plant kit (GeneAll, Korea) and converted into cDNA using the RNA to cDNA EcoDry ™ Premix (Oligo dT) kits (TaKaRa, Japan).Primers for qRT-PCR were designed using Primer3 (https://primer3plus.com) as shown in Supplementary Table S1.qRT-PCR was performed using SsoAdvanced ™ Universal SYBR ® Green Supermix (Bio-Rad Laboratories, CA, United States of America) on QuantStudio ™ 5 (Applied Biosystems, United States of America).The amplification conditions were as follows: 50 °C for 2 min and denaturation at 95 °C for 10 min, followed by 40 cycles of 95 °C for 15 s and 60 °C for 1 min, 72 °C for 15 s.The sesame housekeeping gene 18s rRNA (qFw: 5'-CGTCCCTGCCCTTTGTACAC-3' and qRv: 5'-CGAACACTT CACCGGACCAT-3') was used as a reference gene (Noguchi et al., 2008).The relative quantification of samples was calculated according to the 2 -△△ CT method (Livak and Schmittgen, 2001) in three biological replicates, each with two technical replicates.Student's t-test was used to determine statistical significance between groups.

Development of KASP, InDel, and CAPS markers and evaluation in RILs
Based on variant annotation results between the parents, Indel, Kompetitive Allele Specific PCR (KASP), and Cleaved Amplified Polymorphic Sequences (CAPS) markers were designed using bulkbiased SNPs and InDels located in the intron of qLignan1-1, and in the CDS of qLignan6-1, and in the intergenic region of qLignan11-1.Primers for InDel marker were designed using Primer3 (https:// primer3plus.com),while primers for CAPS marker were designed using dCAPS Finder 2.0 (http://helix.wustl.edu/dcaps).PCR amplification conditions were: 95 °C for 5 min, followed by 30 cycles of 95 °C for 10 s, 60 °C for 15 s, and 72 °C for 15 s.For CAPS marker assay, amplified products were digested with BspHI restriction enzymes.Amplified or digested products were separated and visualized using QIAxcel (an automated capillary electrophoresis system by Qiagen, Hilden, Germany) for genotyping.The KASP primer design was performed by LGC genomics (London, UK) for two allele-specific forward primers and a common reverse primer based on 100 bp of both flanking sequences.The KASP-PCR amplification was conducted following the KASP technology manual (LGC Genomics, Beverly, MA, United States of America): 94 °C for 15 min of activation, 10 cycles of 94 °C for 20 s and a gradual decrease from 61 °C (decrease at 0.6 °C per cycle), and 26 cycles of 94 °C for 20 s and 55 °C for 1 min.

Results
3.1 Sesamin, sesamolin, and lignan content in the parent and RIL, and the construction of two extreme bulks The sesamin and sesamolin content in the Goenbaek, Gomazou, and 257 F7 RILs seeds were investigated and the two compounds were combined to give the total lignan content.Goenbaek and Gomazou seeds had mean sesamin, sesamolin, and lignan contents of 2.7 and 8.2 mg/g, 1.0 and 3.1 mg/g, and 3.7 and 11.3 mg/g, respectively; Gomazou seeds had higher amounts of all components (Table 1).The 257 RIL accessions had mean sesamin, sesamolin, and lignan contents of 5.2, 2.2, and 7.5 mg/g, respectively, and the observed minimum and maximum within-population values suggested that some accessions resulted from transgressive segregation beyond the parental values (Table 1).Statistically significant correlations were observed between the three compounds (Supplementary Figure S1).In particular, sesamin (r = 0.98, p < 0.001) and sesamolin (r = 0.85, p < 0.001) showed strong positive correlations with lignan.Based on the normal distribution of lignan content in the population, 10 individuals were selected from each end of the extremes to comprise a low bulk (Lignan-L) ranging from 1.7 to 3.5 mg/g and a high bulk (Lignan-H) ranging from 10.8 to 12.8 mg/g, with a mean of 2.7 and 11.7 mg/g, respectively (Figure 1, Supplementary Table S2).

Whole genome resequencing of parents and lignan-high and lignan-low bulks
A total of four samples were sequenced, including parents, Lignan-H, and Lignan-L, generating a total of 80.4 Gb of raw data (Supplementary Table S3).After filtering by quality score and adapter trimming, 170,831,013 reads (101 bp in length) were mapped for Goenbaek and 216,367,131 reads (151 bp in length) for Gomazou at average depths of 51x and 103x, respectively.Similarly, Lignan-L and Lignan-H mapped 48,240,157 and 48,403,785 reads, respectively, with an average depth of approximately 21x.All samples covered more than 95% of the sesame reference genome in terms of bases for which at least one read was mapped.Sequenced reads from the parental lines were deposited in the National Center for Biotechnology Information (NCBI) database under the BioProject accession number PRJNA1028140.

Calculation of SNP/InDel-index and QTL-seq analysis
To perform the QTL-seq analysis, Goenbaek and Gomazou were used as consensus genomes with replacement variants for the reference genome.The variants for analysis were selected by ensuring that only alleles that are homozygous and polymorphic  to parental align to one another, resulting in a total of 481,298 SNP/ InDel variants (416,102 SNPs and 65,195 InDels).For these alleles mapped to consensus genomes, the SNP/InDel-index between Lignan-L and Lignan-H bulks was calculated using a sliding window approach with a window size of 1 Mb and a step size of 50 Kb.For example, if both bulks have an identical nucleotide base at an identified position, the SNP/InDel-index value at this position will be 0. Conversely, if the reference genome and the identified nucleotide base are completely inconsistent, the SNP/InDel-index value will be 1.Following the formula: the △(SNP/InDel-index) value as the subtraction of the index between Lignan-L and Lignan-H was plotted as a mirror-like reflected image across all 13 chromosomes according to the two parental consensus genomes (Supplementary Figure S2).
After calculating the △(SNP/InDel-index) value using Goenbaek and Gomazou as references, 12 and 13 genomic regions were obtained as candidate QTLs controlling lignin at a 99% significance level, respectively, and a total of 10 common genetic regions were found (Table 2).Among them, three major QTLs with △(SNP/InDel-index) > 0.7, which were expected to be closely associated with the trait, were identified on chromosomes 1, 6, and 11, and were namely: qLignan1-1, qLignan6-1, and qLignan11-1 (Figure 2).The qLignan6-1 region comprises 1.9 Mb (14,600,000-16,500,000 bp), and the △(SNP/InDelindex) value was close to '1' when calculated on the basis of Goenbaek and close to '-1' when calculated on the basis of Gomazou (Table 2; Figure 2).This result implies that this QTL region consists of a pool (Lignan-H) of individuals with a region inherited from high lignin in Gomazou and a pool (Lignan-L) of individuals with a region inherited from low lignin in Goenbaek, and lignan content is partially regulated by underlying genetic factors in this region.In contrast, qLignan1-1 and qLignan11-1 comprise a region of 1.65 Mb (15,100,000-16,750,000 bp) and 1.6 Mb (500,000-2,100,00 bp), respectively, and showed the opposite pattern of index values to qLignan6-1 (Table 2; Figure 2).

Functional classification of variants and candidate gene search
Variant analysis was performed to explore candidate SNPs/ InDels or genes associated with sesame seed lignin content.First, we used the information from the reference genome to search for genes located in three QTL intervals, qLignan1-1, qLignan6-1, and qLignan11-1, and found 220, 195, and 146 genes, respectively (Table 3).A total of 2,272 variants were then selected, with counter-SNPs/InDels being defined as variants that were completely biased to one side between the bulks representing index values of 1 or -1.Using their location, the functional classification of the variant was performed according to the annotation and expected change in function within or near the gene.As a result, a total of 11 counter-SNPs/InDels in qLignan1-1 were found to be in the intron region of one gene, 468 and 232 counter-SNPs/InDels in qLignan6-1 and qLignan11-1, respectively, were associated with 60 and 18 genes, respectively (Table 3).To find genes that were previously reported to affect secondary metabolite synthesis, functional descriptions of the genes were obtained using BLASTP against A. thaliana protein sequences.The KEGG analysis revealed that out of a total of 73 genes, 18 were assigned KEGG orthology accession numbers and mapped to  Frontiers in Genetics frontiersin.org10 34 pathways, with 10 genes belonging to metabolic pathways, followed by seven genes belonging to secondary metabolite biosynthesis (Table 4).Based on the genes assigned to metabolic pathways, five genes were selected for transcriptional expression analysis (Figure 3).Among them, three genes (SIN_1018420, SIN_ 1018429, SIN_1018493) have CDS variants with altered amino acid sequences, and two genes (SIN_1015690 and SIN_1015689) belonging to the tyrosine and phenylalanine pathway.The results revealed significantly lower levels of SIN_1018420, a homolog of AT1G12240 (glycosylhydrolase family 32 protein), in immature seeds at the early stage (15 days after flowering) of Gomazou compared to those of Goenbaek.At both maturation stages, significant differences in gene expression were observed between the parental lines for SIN_1018429, a homolog of AT2G30575 (adenine phosphoribosyltransferase 4), and SIN_1018493, a homolog of AT2G30575 (los glycosyltransferase 5).Two gene homologs of AT4G12290, SIN_1015690, and SIN_1015689, which encode proteins of the copper amine oxidase family, exhibited different patterns of gene expression.Specifically, SIN_1015690 showed higher expression in the Gomazou compared to the Goenbaek during later stages of immature seeds (30 days after flowering).Conversely, SIN_1015689 showed significantly lower gene expression in the Gomazou than in the Goenbaek during early stages of immature seeds.

Additive effect of alleles and QTL stacking
To investigate the allelic effect of the three QTL with high index values and to correlate the genotype and phenotype of individuals in the population, KASP, InDel, and CAPS markers were developed  using counter-SNP/InDel (Table 5).The allelic genotyping markers were used for variants located in the CDS region of qLignan6-1, in the intergenic region of qLignan11-1, and in the intron region of qLignan1-1.The differences in lignan content associated with each QTL allele in the population were statistically significant (Figure 4A).Specifically, qLignan1-1 and qLignan11-1 exhibited higher lignan content along the Goenbaek allele, while qLignan6-1 showed higher lignan content along the Gomazou allele.The investigation of potential interactions among the QTLs revealed that specific combinations of alleles resulted in high-lignan content.
Combinations of two or more alleles were found to contribute to an increase in lignan content in comparison to single alleles.The highest lignan content was observed when all three QTL alleles were incorporated (Figure 4B).Combinations including Lignan6-1, which showed significantly higher lignan content when combined with other QTLs, were particularly noticeable.This finding provides compelling evidence for the substantial contribution of the Gomazou-derived allele to lignan content.

Discussion
Identification of QTLs and the development of molecular markers that are closely linked to useful genes can be a useful tool in the breeding of new varieties by applying MAS, saving time and labor, and accelerating breeding cycles (Paran and Zamir, 2003;Collard et al., 2005).It is important that QTLs or markers are used to estimate phenotypic expression at the molecular level, particularly because direct measurement of quantitative traits is difficult (Nadeem et al., 2018).The SNP markers associated with the ahFAD2 gene, responsible for the introgression of high oleic acid content in peanut and SSR markers for the selection of the Gpc-B1 gene, responsible for the selection of high grain protein varieties, are examples of advancing the effectiveness of breeding using marker selection resulting from QTL studies (Vishwakarma et al., 2014;Bera et al., 2018).Recently, by fusing genome-level information on components such as secondary metabolites with information on the biosynthesis of compounds using advanced NGS technology, research is expanding into a new area called the metabolome (Keurentjes, 2009;Carreno-Quintero et al., 2013).For example, 4,681 metabolic quantitative trait loci (mQTLs) were identified as profiling metabolites that accumulate in seeds at different developmental stages in rice (Li et al., 2019), and 42 loci were identified from 1,098 associations in a metabolite-based genome-wide association study (mGWAS), thus suggesting breeding strategies for flavonoid pathway metabolites in wheat (Chen et al., 2020).
Finding a resource that is suitable for the target trait and generating a genetic population to find the genes that control the trait are necessary to achieve successful QTL results.In this study, to find the regulators of lignan content in sesame, an F7 RIL population was established using Goenbaek, an elite variety with stable cultivation but low-lignan content, and Gomazou, a variety with high-lignan content, as parental lines.We then compared sequence information from a pool of transgressive segregating individuals containing high and low-lignan content for the QTL analysis.Based on overlapped genetic regions from comparing two parental versions as a reference for the calculation of sequence frequencies to avoid errors caused by heterozygous or missing positions in the mapping process, three major QTLs (qLignan1-1, qLignan6-1, and qLignan11-1) were identified.In previous studies, QTL results using the highly biased variant frequency index in separated bulks have been shown to be reliable in comparison to traditional QTL analysis in rice, tomato, and maize (Chen et al., 2021;Topcu et al., 2021;Yang et al., 2021).The identified qLignan11-1 is also located in close proximity to qSim_11.1 and qSmol_11.1,which were reported to have high phenotypic effects (67.69% and 46.05%, respectively) on sesamin and sesamolin content using F8 RILs in sesame (Xu et al., 2021).Although using the lignan with the combined content of sesamin and sesamolin as an indicator is a potential limitation of this study, the strong correlation between the components (Supplementary Figure S1) supports our finding on a comprehensive locus that can be used for breeding.However, a genetic approach to the synthesis of sesamin and sesamolin is needed for further functional analysis.The relatively large size of the QTL region, which ranged from 1.6 to 1.9 Mb and contained 140-220 genes, limited the selection of specific candidate genes.To overcome this limitation, our study followed a two-step approach to candidate gene selection.First, some of the variants were extremely segregated within the region confined to the QTL region, whereas the SNP index values in most of the regions were heterozygous.These extreme variants were defined as counter-SNPs/InDels, with 11, 468, and 232 variants within each QTL region, spanning one, 60, and 18 genes, respectively (Table 3).Then, we conducted a comparative analysis based on the gene annotation of A. thaliana.By selecting genes belonging to secondary metabolic pathways, which are expected to be functionally altered due to variations in genomic regions affecting protein coding (Table 4), we validated their expression levels using parental lines with large differences in lignan content.As a result, All five genes, SIN_1018420, SIN_1018429, SIN_1018493, SIN_1015690, and SIN_1015689, were differentially expressed in the seeds (Figure 3), but further functional studies are needed to elucidate the trait-associated roles of these genes.Our results suggest that despite the large number of genes distributed among the potential loci, the gene and pathway databases can be helpful in the logical selection of promising candidate genes.
Quantitative traits are determined by multiple genes, each of which may have a large or small effect.Since the quantitative trait is often determined by the interaction of multiple genes with environmental influences, it is preferable to use methods such as QTL-seq to find useful genetic regions focusing on major QTL; although there are limitations in detecting minor QTL (Dong et al., 2021;Sheng et al., 2021).While the elucidation of gene functions at the molecular level is important, it is also important that QTL study results are practically applied in breeding through the use of major gene-induced markers for the efficient selection of pedigrees for time and labor reduction (Xu et al., 2018;Yang et al., 2019).This study verified the reliability of the QTL with KASP, InDel, and CAPS markers, showing that the allele derived from the lignan-rich Gomazou, qLignan6-1, and two alleles derived from Goenbaek, qLignan1-1 and qLignan11-1, are dominant.The influence of the allele derived from the high-lignan Gomazou is higher compared to the two QTLs derived from the low-lignan Goenbaek, suggesting that the QTLs derived from the Goenbaek are likely to be minor QTLs.Interactions between alleles were also analyzed to exploit the additive allele effect and found that the dominant form of qLignan6-1 had greater synergy when combined with other QTLs.It is therefore expected that qLignan6-1 will play an important role in controlling lignan content, in addition to additive effects with other loci.The discovery of qLignan6-1 will be useful in the breeding process as a marker for the introgression of useful genes from highlignan resources into a new variety.
Furthermore, lignans in sesame have been shown to be functional components with antioxidant, anti-dementia, and antihypertensive activities, and the genetic mechanisms required for lignan synthesis have been reported using QTL, GWAS, and transcriptomics in several studies (Harada et al., 2020;Xu et al., 2021;Dossou et al., 2023).CYP92B14, an enzyme that converts sesamin to sesamolin and sesaminol, which was elucidated through functional characterization (Harada et al., 2020), and the SiSNT1 gene located on chromosome 11, inferred through genetic mapping (Xu et al., 2021;Dossou et al., 2023), may explain the genetic mechanisms of lignan synthesis in sesame.However, these are still poorly understood compared to other major crops.In this study, we found two novel genetic contributors that regulate lignan content, qLignan1-1 and qLignan6-1, along with qLignan11-1 at the same location as in previous study.By showing the utility of marker interaction that can be used in practical breeding strategies, we are confident that this will be useful as a basic resource for the development of high-functional lignan varieties.

FIGURE 2
FIGURE 2QTL-seq analysis for the identification of major effect QTLs controlling lignan content in sesame.(A) SNP/Indel-index plot for high-(green) and lowlignan (yellow) bulks, and △SNP/InDel-index (blue) based on 'Goenbaek' as a reference.(B) SNP/Indel-index plots for high-(green) and low-lignan (yellow) bulks, and △SNP/InDel-index (blue) based on 'Gomazou' as a reference.The average index value was plotted using red lines using a sliding window approach of 1 Mb intervals with 50 kb increments.The △SNP/InDel-index obtained by subtracting the SNP/InDel-index of low-lignan bulk from the SNP/InDel-index of high-lignan bulk was calculated with the statistical confidence interval under the null hypothesis of no QTL (orange, p < 0.01; and light green, p < 0.05).The three major effect QTLs with the |△SNP/InDel-index|≥ 0.7 were shadowed in red on chromosome 1, 6, and 11, and designated as qLignan1-1, qLignan6-1, and qLignan11-1, respectively.
of 79 genes functionally annotated in three QTLs and functional description based on Arabidopsis thaliana homologues with KEGG orthology.lipid-transfer protein/seed storage 2S albumin superfamily protein (Continued on following page) Arabidopsis thaliana genes of TAIR, version10.b KO: Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology.
FIGURE 3qRT-PCR analysis of selected genes mapped in the KEGG pathway for the discovery of candidate genes.Relative expression of five genes in seeds at 15 and 30 days after flowering (FT) between Goenbaek (GB, orange) and Gomazou (GM, green).Error bars represent standard errors of three biological replicates.Statistically significant differences are indicated as a p-value <0.05, *, <0.01, **, and <0.001, ***.

FIGURE 4
FIGURE 4 Comparison of lignan content effects across alleles between three QTLs.(A) Results of the effect of the lignan content of individual alleles.Genotype 'A' refers to alleles derived from the parent, Goenbaek, and 'B' refers to alleles derived from Gomazou.Statistically significant differences are indicated as a p-value <0.01, **, and <0.001, ***.(B) Results of the effect of the lignan content by combinations of alleles.Each QTL designation denotes alleles in the group with higher lignan content as large.Post hoc tests were used to indicate statistical significance.

TABLE 1
Variation of sesamin, sesamolin, and lignan content in the parental lines and Recombinant Inbred Lines (RILs) population.Frequency distribution of lignan content in the population of Goenbaek × Gomazou Recombinant Inbred Lines (RILs).Arrows indicate the mean lignin content of the parental variants.The average lignan content is represented in low and high bulk, respectively.

TABLE 2
Number of significant loci and SNP/InDel from QTL-seq analysis.

TABLE 3
Summary of the number of counter-SNPs/InDels near/or within genes in QTL regions.

TABLE 4 (
Continued) Lists of 79 genes functionally annotated in three QTLs and functional description based on Arabidopsis thaliana homologues with KEGG orthology.

TABLE 5
DNA marker information to identify alleles for lignan content controlling in sesame.