Construction of a genetic map and QTL mapping of seed size traits in soybean

Soybean seed size and seed shape traits are closely related to plant yield and appearance quality. In this study, 186 individual plants of the F2 generation derived from crosses between Changjiang Chun 2 and JiYu 166 were selected as the mapping population to construct a molecular genetic linkage map, and the phenotypic data of hundred-grain weight, seed length, seed width, and seed length-to-width ratio of soybean under three generations of F2 single plants and F2:3 and F2:4 lines were combined to detect the QTL (quantitative trait loci) for the corresponding traits by ICIM mapping. A soybean genetic map containing 455 markers with an average distance of 6.15 cM and a total length of 2799.2 cM was obtained. Forty-nine QTLs related to the hundred-grain weight, seed length, seed width, and seed length-to-width ratio of soybean were obtained under three environmental conditions. A total of 10 QTLs were detected in more than two environments with a phenotypic variation of over 10%. Twelve QTL clusters were identified on chromosomes 1, 2, 5, 6, 8, 13, 18, and 19, with the majority of the overlapping intervals for hundred-grain weight and seed width. These results will lay the theoretical and technical foundation for molecularly assisted breeding in soybean seed weight and seed shape. Eighteen candidate genes that may be involved in the regulation of soybean seed size were screened by gene functional annotation and GO enrichment analysis.


Introduction
Soybean (Glycine max L.) is one of the most economically important crops in the world, which is also used as a model plant for research on legumes.It is a rich source of both edible oil and plant-based protein because of its atmospheric nitrogen fixing capability through a symbiotic interaction with soil microorganisms (Tzen et al., 1993).Soybean is widely grown and consumed globally and constitutes nearly 28% of vegetable oil and 70% of protein meals worldwide (Sulistyo et al., 2021).Soybean is a major oilseed crop that supplies plant protein and oil for humans and animals.Compared with other major crops, such as rice, wheat, and maize, the yield of soybean is approximately 2-3-fold lower.As a result, increasing the soybean yield is an important and urgent task in soybean breeding (Liu et al., 2020a;2020b).The yield of soybean is a complex trait that is determined by many components, among which seed size is one of the primary indexes.In China, soybean production has continuously declined, with a considerably low yield increase in the past 50 years.Moreover, China imports >80% of soybean for their total domestic use; hence, it is a prerequisite to increase the domestic production of soybean to make the country self-sufficient (Liu et al., 2018).Different yield-related traits are targeted by plant breeders to increase soybean production.In this context, seed weight is one of the most important yield-related traits for increasing seed yield in soybean; however, it is a complex quantitative trait governed by polygenes and is highly influenced by the environment, which makes its selection difficult for plant breeders (Yao et al., 2015).
In crop breeding, seed size is one of the most important agronomic traits that needs to be considered.For instance, the seeds of cultivated crops are usually larger than those of their corresponding wild ancestors, which shows parallel selection (Doebley et al., 2006).Seed size can be described by three main dimensions: length, width, and thickness (Xing and Zhang, 2010).Seed size and shape play a key role in determining seed weight and yield in soybean (Salas et al., 2006;Yan et al., 2017).Seed appearance including seed length (SL), seed width (SW), and seed thickness (ST) as well as seed shape traits such as seed length-to-width (SLW), length-tothickness (SLT), and width-to-thickness (SWT) ratios affect seed yield (Xu et al., 2011;Liang et al., 2016).Seed size, which is measured with hundred-grain weight (HGW), is a fitness trait that is essential for environmental adaptation (Tao et al., 2017).
Quantitative trait locus (QTL) analysis provides a powerful tool for soybean breeders to search for new sources of variation and investigate the genetic factors underlying quantitatively inherited traits.However, across various genetic backgrounds and conditions, only a few numbers of stable QTL clusters associated with seed and yield-related traits, including seed length (SL), seed width (SW), seed thickness (ST), length-to-width (SLW) ratio, length-to-thickness (SLT) ratio, width-to-thickness (SWT) ratio, and hundred-grain weight (HGW), have been found.Therefore, for effective employment of QTL in marker-assisted breeding, it is essential to find QTLs and confirm them in a variety of backgrounds and conditions.Apuya et al. successfully constructed the first soybean genetic linkage map using F 2 as the mapping population, which contained 11 RFLP markers, and the experiment demonstrated their distribution across four linkage groups (Apuya et al., 1988).From the soybean database, the results of studies on QTL localization for many traits have been reported.According to the latest database (SoyBase (http://www.soybase.org)),304 QTLs related to HGW have been localized, and 52, 32, and 70 QTLs are related to seed length, seed width, and seed length-to-width ratio, respectively (http:// soybase.ncgr.org).In 1996, Mansur detected three QTLs associated with HGW using the RIL population (Mansur et al., 1996).Kulkarni in 2017 identified nine QTLs for HGW, localized on eight linkage groups, using recombinant inbred lines (RILs) constructed from a cross of Williams 82 and PI366121 (Kulkarni et al., 2017).Jun et al. used recombinant self-incompatible line populations derived from a cross between LSZZH and N493 for QTL localization of seed length and seed width and localized eight QTLs

Chromosome
Groups Markers Total interval (cM) Average interval (cM) Minimum interval (cM) related to seed length on six linkage groups and nine QTLs related to seed width on eight linkage groups (Jun et al., 2014).Salas et al. performed QTL localization for seed length using a recombinant self-incompatible population obtained from a cross between Minsoy and Noir l.A total of 13 QTLs associated with seed length were located in six linkage groups (Salas et al., 2006).Kumar et al used vegetable types and seed soy-derived F 2 and F 2:3 to map populations.A total of 42 QTLs were identified, distributed on 13 chromosomes (Kumar et al., 2023).Using RILs of 300 individuals populated by the cross derived between soybean PI595843 (PI) and WH as materials, Xu et al. detected a total of 38 QTLs related to HGW, identified four major QTLs, and identified six candidate genes (Xu et al., 2023).Elattaret al used two RIL populations, LM6 and ZM6, to detect 48 mQTLs associated with HGW and 99 mQTLs associated with seed shape traits in 19 soybean chromosomes under four environments (Elattar al., 2021).Chen et al. used an RIL containing 364 individuals of Zhongdou 41×ZYD 02.878 as materials to identify HGW and other traits in soybean.A total of 12 QTLs associated with HGW were identified (Chen et al., 2023).Until now, there were only a few papers focusing on the mapping of QTL for seed size and shape using the high-density map in various genetic backgrounds of soybean (Karikari et al., 2019).In addition, most of the previous publications did not report the candidate genes for seed shape and seed weight (Zhang et al., 2004;Niu et al., 2013;Kato et al., 2014;Xie et al., 2014;Wu et al., 2018).The present study is aimed at constructing a relatively high-density map and mapping QTL for seed size traits using a population derived from a cross between Changjiang Chun 2 (CJC2) and JiYu 166 (JY166) in three environments, and the results are expected to be useful for marker-assisted selection (MAS) and to improve our understanding of genetic mechanisms underlying seed size traits in soybean.

Plant materials
Changjiang Chun 2 (CJC2) is a high-yielding, high-protein cultivar with larger seeds and a HGW of 23.3 g, which was released in Chongqing, China.JiYu 166 (JY166) is a widely adapted material in China with a HGW of 17.6 g.The significant difference in seed size between the two parents makes the study intuitive and potential.In this study, 186 single plants from the F 2 population of the cross derived between CJC2 and JY166 were used as the genotyping population.The F 2 and F 2:3 populations were planted in the summer of 2021 and 2022 in Chongqing (21CQ and 22CQ), respectively, and the F 2:4 population was planted in the winter of 2022 in Yunnan (22YN), China.The F 2 population was sown with a single plant.F 2:3 and F 2:4 families were sown in a single row, with a row length of 1 m, a row width of 0.5 m, and plant spacing of 0.2 m.In addition, all populations were conducted with general field management.The material was harvested after maturity for further examination of seed size traits.

Trait measurements
Seed shape and hundred-grain weight were evaluated for three generations.The following seed traits were measured using the SC-G software (Wanshen Detection Technology Co., Ltd., Hangzhou, China).Seed length (SL), seed width (SW), hundred-grain weight (HGW), and seed length-to-width ratio (SLW) were determined, and the image analysis method was used for determining the soybean seed traits.Approximately 40 soybean seeds were spread on the white plate of a flatbed scanner (Eloam Technology Co., Ltd., Shenzhen, China).The scanner was set in inverse scanning and positive film mode, 24-bit color and a DPI resolution of 300.The image was processed with the SC-E software (Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China, www.wseen. com).First, the image was converted to a 24-bit grayscale image immediately after scanning and stored in PNG format automatically for further analysis.The image obtained was 3,410 × 2,400 pixels in size.Second, the background was subtracted to remove the effect of background texture, and any overlapped soybean seed was segmented (Zhong et al., 2009).After that, seed parameters were extracted and stored, and the soybean seeds were differently mapped.Finally, the length, width, and HGW of soybean were displayed based on the stored parameters.

DNA extraction and SSR marker detection
Genomic DNA was extracted from young leaves collected from the F 2 population of 186 single plants, two parent plants, and F 1 plants, as described in Zhang et al. (2005).A total of 2,933 SSR primer pairs were synthesized by Biotech Bioengineering Co., Ltd., (Shanghai, China) derived from the soybean database SoyBase (http://www.soybase.org/)and Song et al. ( 2010) (some of these BARCSOYSSR primers were renamed as SWU in this study, as detailed in Supplementary Table S1).PCR amplification was performed as described by Zhang et al. (2002).Primers with polymorphisms between the two mapping parents were used to genotype the single plants of the F 2 population.The band type identical to CJC2 was recorded as A, the band type identical to JY166 was recorded as B, the heterozygous band type was recorded as H, and the deletion was recorded as U.The results were then gathered for further analysis.

Map construction and QTL detection
The marker linkage analysis was performed using the mapping software JoinMap 4.0, and the genetic linkage map was constructed with an LOD score of 4.0, a recombination frequency of 0.4, and a converting method of the Kosambi mapping function.The mapping software MapChart 2.2 was used to draw the linkage maps.The genetic linkage map was used to identify QTLs for seed size traits through multiple QTL mapping methods of MapQTL 6.0.A stringent LOD threshold of the genome-wide association with a p-value of 0.05 was used to mark significant QTLs for each trait according to the results from the 1,000 permutation test.Additive effects were defined with respect to the alleles of CJC2.Thus, positive genetic effects indicated the alleles of CJC2 increased the phenotypic trait values, and negative values indicated that the alleles of CJC2 decreased phenotypic values.Those effects with an LOD up to 3.0 were used as an indicator for the presence of QTL.
The QTLs we found were named with the letter q, the trait name, the chromosome number, and the order of QTL identified on the same chromosome.For example, for the QTL denoted as qHGW01.1,q indicates QTL, HGW stands for the trait (hundred-grain weight), 01 shows the chromosome on which the FIGURE 2 Linkage map and QTL for seed size traits derived from (CJC2 × JY166) population.
QTL was detected, and 01.1 indicates the order of QTL identified on the chromosome for each trait.

Candidate gene prediction analysis
The gene and functional annotations in candidate QTLs are obtained on the SoyBase (http://www.soybase.org),and the term GO (Gene Ontology) is enriched to analyze the families and subfamilies, molecular functions, biological processes, and pathways of the genes in the identified QTLS.Finally, candidate genes for seed size-related functions were screened.

Result Marker polymorphism analysis
Out of 2,933 SSR primer pairs, 518 primer pairs showed polymorphism between the mapping parents.The average polymorphism rate of all primers is 17.7%, and among them, Sat had the highest polymorphism rate of 28.2%, while SWU had the lowest polymorphism rate of 13.0%.A total of 518 pairs of polymorphic primers were used to detect the marker genotypes of the F 2 population, and 455 marker loci were obtained.

Genetic map construction
Using the obtained marker loci, a linkage map containing 27 linkage groups and 455 loci was constructed.The map spanned 2799.2M. The longest linkage group is 217.4 cM, the shortest is 12.2 cM, and the average distance between the markers is 6.15 cM.The maximum number of chromosome 2 markers was 51, and the minimum number of chromosome 16 markers was 10 (Table 1; Figure 1).

Trait phenotype analysis
The results of phenotypic data analysis for the three environments are presented in Table 2.The HGW, seed length, and seed width of CJC2 were higher than those of JY166, and all four traits in the population were segregated to a certain extent, with coefficients of variation ranging from 3.30% to 15.42%.There was transgressive segregation for each trait.The histogram of frequency distribution showed that the four traits were approximately normally distributed in the three environments, which was consistent with the genetic rule of quantitative traits (Figure 1).
Correlation analysis (Figure 2) showed that there was a certain correlation among various traits of seed size.Among the three environments, HGW was positively correlated with SL and SW, while SL was positively correlated with SW.Among them, the correlation coefficient between HGW and SW is the largest and can reach the maximum value of 0.888.Suitable varieties can be selected according to this law in breeding.In addition, since SLW means SL/SW, it is not surprising that SLW is positively related with SL and negatively with SW.HGW is positively related to SL and SW, and it is clear that HGW has a closer correlation with SW than with SL, which indicates a difference between the correlation of HGW and SL and the correlation of HGW and SW.This difference is also different in the three environments.In addition, in the environment of 22YN, the negative correlation between SW and SLW is not that significant, which deserves further detection Table 3.

QTLs identified for seed size traits
For hundred-grain weight, six QTLs (Table 4) were identified and mapped on 10 chromosomes, explaining the phenotypic variation from 7.8% to 19.8%.qHGW01.1,qHGW02.1, and qHGW06.1 were identified in two environments, with the maximum phenotypic variation of 16.30%, 14.60%, and 16.40%, respectively.The favorable alleles of six QTL were originated from CJC2.

Candidate gene prediction within stable QTLs
Through the aforestated QTL localization, this study found that QTLs for multiple traits overlap.A total of 12 QTL clusters were detected on chromosomes 1, 2, 5, 6, 8, 13, 18, and 19, and four QTL clusters with HGW and SW were detected, named qWH.Among those QTL clusters, one QTL cluster named qWH06.1,which was detected stably in two environments (21CQ and 22CQ), was selected for the further detection of candidate genes.The qWH06.1 was located on chromosome 6 from 4953364 bp to 7378025 bp, in which 285 genes were detected.Those genes were then sorted and processed for GO enrichment analysis.
GO enrichment analysis revealed that most genes detected on qWH06.1 are associated with cell composition and molecular function.Most of these genes are concentrated in the nuclear, plasma membrane, protein binding, and other pathways (Figure 3).After gene function annotation screening, a total of 18 candidate genes that may be involved in regulating seed size and weight traits were obtained (Table 5).The 18 genes encode the homeodomain-leucine zipper, the zinc finger transcription factor GATA, the serine/threonine protein kinase, the CXC domain, the cysteine-rich domain, the galactosylgalactosylxyloglucosan 3-beta-glucuronosyltransferase, and others.Frontiers in Genetics frontiersin.org09 GO term enrichment analysis of the genes located within qWH06.1.

Discussion
Soybean seed size traits are mostly quantitative traits controlled by polygenes.At present, there are a lot of QTL mapping studies for quantitative traits in soybean, but most of these QTLs have not been applied to soybean breeding (Csanádi et al., 2001).In this study, the F 2 generation population derived from the cross between CJC2 and JY166 was used for QTL mapping, and a soybean genetic map with a total length of 2799.2 cM was developed, containing 455 markers with an average map distance of 6.15 cM.There are many genetic markers in this map, which is conducive to subsequent fine mapping and provides conditions for future marker-assisted selection, mining favorable alleles, and exploring related mechanisms of seed development regulation.In this paper, based on the average values of the three environments, the range of HGW between populations was 11.57g-23.05g, indicating that there was a large variation in this population, and the coefficient of variation was relatively the largest among the four traits, ranging from 10.43% to 15.42%.A total of six QTLs related to HGW were detected in the population, and all the favorable alleles were from CJC2.Among them, qHGW13.2has been reported by a previous study (Wu et al., 2020).A total of nine QTLs related to seed length were detected, with phenotypic variation rates ranging from 8.1% to 21.4%.So far, 29 QTLs related to grain length have been published, but few of them have been reported.The QTLs detected in this study were all unreported QTLs.The extremely high temperature in Chongqing in the summer of 2022 affected later seed development, and the addition of generations in Yunnan in the winter of 2022 shortened the growth period of soybean.Both of them had a passive influence on seed width traits.So, it was speculated that seed width growth might occur in late seed development.A total of 16 seed width QTLs were detected, and the phenotypic variation rate ranged from 9.00% to 31.30%.Two of them have been reported.Hina et al. (2020) located a QTL related to seed width on chromosome 1 of soybean, from 49641073 to 51122075 bp, which coincided with qSW01.1 detected in this study, and qSW10.1 detected in this study was also reported in the previous study.The identical results between the QTLs identified in our study and published QTLs for soybean seed size-related traits indicate the accuracy of these QTLs.A total of 46 QTLs have been identified for the first time, and these new QTLs have potential value for the development of improved soybean varieties.
In the present study, we found that there were overlapping QTLs for multiple traits detected, with 12 QTL clusters located on chromosomes 1, 2, 5, 6, 8, 13, 18, and 19, and each QTL cluster was associated with two or more traits related to seed size.QTL clusters may represent gene/QTL linkage or pleiotropic effects from a single QTL in the same genomic region.These QTL clusters can lay a foundation for further mining of target genes controlling seed size.In addition, QTLs for HGW and SW were detected in the same region of multiple chromosomes, and correlation analysis showed that the correlation coefficient between HGW and SW was the largest and strongest.This suggests that there may be a gene with multiple effects coordinating the control of SW and HGW, which could be further explored for the correlation between the developmental mechanisms of SW and HGW.
The QTL intervals related to seed size traits we have detected were compared with that in the soybean public database, and in addition to overlapping intervals with seed size-related traits, many QTLs were found to have overlapping regions with protein content QTL, oil content QTL, days to flowering, and maturity.Therefore, it is hypothesized that genes regulating protein and oil content synthesis or metabolism may be associated with genes regulating soybean seed size.Days to flowering and maturity were suggested by Cober and Morrison (2010) to be directly related to soybean yield, suggesting the potential for common genetic factors for these traits and the need to promote further research on these regions.
Candidate genes are mainly related to cell composition, catalytic activity, transport, metabolism, and cellular processes.Glyma.06g072200encodes the WD40 protein that plays an important role in plant growth and development, seed development, and hormone responses, such as the GTS1 (WD40) synergizes with Nop16 and L19e to regulate seed germination and biomass accumulation (Gachomo et al., 2014).Overexpression of GATA encoded by Glyma.06g086400not only inhibited growth and biomass accumulation of most phenotypic traits but also altered the expression of some major TFs and pathway genes involved in the secondary cell wall (SCW) and programmed cell death (Ren et al., 2021).Glyma.06g067900participates in the encoding of p300 and CBP as transcriptional co-regulators involved in the execution of a wide range of cellular gene expression programs controlling cell differentiation, growth, and homeostasis (Bordoli, 2001).Glyma.06g069200encodes the plastid-localized DEAD-box RNA deconjugating enzyme 22, which regulates the accumulation of large quantities of storage products by plants in seeds (Kanai et al., 2013).Glyma.06g068100encodes different physiological substrates of threonine/tyrosine protein kinases and their roles in seed oil accumulation (Ramachandiran et al., 2018), which provide energy reserves and nutrients for seed germination and postgermination growth.
Forty-three members of the glycosyltransferase family, encoded by Glyma.06g08380 and Glyma.06g091200, were suggested to play an important role in the synthesis of hemicellulosic glucuronic xylan (GX) in plant secondary cell walls (Wu et al., 2010).La is expressed during plant development, and inactivation of AtLa1 in Arabidopsis leads to embryonic lethal phenotypes, in which defective embryos are prevented from developing at early spherical stages (Fleurdépine et al., 2007).Thus, it is hypothesized that La encoded by Glyma.06g082200plays a role in seed development.Glyma.06g083400 is involved in encoding TSO1, which is involved in the maintenance of stable repression of gene expression through cell division and functions to regulate cell proliferation (Reyes and Grossniklaus, 2003), and is thus hypothesized to increase HGW by regulating cellular value addition.
Therefore, based on gene function, GO, and literature search, the 18 aforementioned genes are considered candidates with the potential to regulate soybean seed size traits.However, the exact mechanism remains to be further studied.The markers and candidate genes identified in this study provide important theoretical basis and genetic resources for the improvement of soybean seed size traits.

TABLE 1
Distribution of markers on chromosomes in a map developed from the F 2 population.

TABLE 2
Characteristics of seed size traits in the population in three environments.

TABLE 3
Correlation analysis of soybean size traits.

TABLE 4
QTL identified for seed size traits in three environments.

TABLE 4 (
Continued) QTL identified for seed size traits in three environments.

TABLE 5
Candidate genes identified in seed size QTL regions.