Identification of Novel Genomic Regions for Biofortification Traits Using an SNP Marker-Enriched Linkage Map in Wheat (Triticum aestivum L.)

Micronutrient and protein malnutrition is recognized among the major global health issues. Genetic biofortification is a cost-effective and sustainable strategy to tackle malnutrition. Genomic regions governing grain iron concentration (GFeC), grain zinc concentration (GZnC), grain protein content (GPC), and thousand kernel weight (TKW) were investigated in a set of 163 recombinant inbred lines (RILs) derived from a cross between cultivated wheat variety WH542 and a synthetic derivative (Triticum dicoccon PI94624/Aegilops tauschii [409]//BCN). The RIL population was genotyped using 100 simple-sequence repeat (SSR) and 736 single nucleotide polymorphism (SNP) markers and phenotyped in six environments. The constructed genetic map had a total genetic length of 7,057 cM. A total of 21 novel quantitative trait loci (QTL) were identified in 13 chromosomes representing all three genomes of wheat. The trait-wise highest number of QTL was identified for GPC (10 QTL), followed by GZnC (six QTL), GFeC (three QTL), and TKW (two QTL). Four novel stable QTL (QGFe.iari-7D.1, QGFe.iari-7D.2, QGPC.iari-7D.2, and QTkw.iari-7D) were identified in two or more environments. Two novel pleiotropic genomic regions falling between Xgwm350–AX-94958668 and Xwmc550–Xgwm350 in chromosome 7D harboring co-localized QTL governing two or more traits were also identified. The identified novel QTL, particularly stable and co-localized QTL, will be validated to estimate their effects on different genetic backgrounds for subsequent use in marker-assisted selection (MAS). Best QTL combinations were identified by the estimation of additive effects of the stable QTL for GFeC, GZnC, and GPC. A total of 11 RILs (eight for GZnC and three for GPC) having favorable QTL combinations identified in this study can be used as potential donors to develop bread wheat varieties with enhanced micronutrients and protein.

Micronutrient and protein malnutrition is recognized among the major global health issues. Genetic biofortification is a cost-effective and sustainable strategy to tackle malnutrition. Genomic regions governing grain iron concentration (GFeC), grain zinc concentration (GZnC), grain protein content (GPC), and thousand kernel weight (TKW) were investigated in a set of 163 recombinant inbred lines (RILs) derived from a cross between cultivated wheat variety WH542 and a synthetic derivative (Triticum dicoccon PI94624/Aegilops tauschii [409]//BCN). The RIL population was genotyped using 100 simple-sequence repeat (SSR) and 736 single nucleotide polymorphism (SNP) markers and phenotyped in six environments. The constructed genetic map had a total genetic length of 7,057 cM. A total of 21 novel quantitative trait loci (QTL) were identified in 13 chromosomes representing all three genomes of wheat. The trait-wise highest number of QTL was identified for GPC (10 QTL), followed by GZnC (six QTL), GFeC (three QTL), and TKW (two QTL). Four novel stable QTL (QGFe.iari-7D.1, QGFe.iari-7D.2, QGPC.iari-7D.2, and QTkw.iari-7D) were identified in two or more environments. Two novel pleiotropic genomic regions falling between Xgwm350-AX-94958668 and Xwmc550-Xgwm350 in chromosome 7D harboring co-localized QTL governing two or more traits were also identified. The identified novel QTL, particularly stable and co-localized QTL, will be validated to estimate their effects on different genetic backgrounds for subsequent use in marker-assisted selection (MAS). Best QTL combinations were identified by the estimation of additive effects of the stable QTL for GFeC, GZnC, and GPC. A total of 11 RILs (eight for GZnC and three for GPC) having favorable QTL combinations identified in this study can be used as potential donors to develop bread wheat varieties with enhanced micronutrients and protein.
Keywords: biofortification, QTLs, malnutrition, SNPs, SSRs, mapping INTRODUCTION Wheat (Triticum spp.) is a major staple cereal crop contributing about 20% calories to the diet and at least 30% of Fe and Zn intake worldwide. Even though it has the highest levels of micronutrients among the three major cereals viz., wheat, rice, and maize, most wheat-based diets fail to deliver the required quantity of essential nutrients, such as iron and zinc. Malnutrition due to insufficient intake of micronutrients, such as iron and zinc, has been recognized as one of the major global health issues affecting nearly three billion people across the globe. The intensity of the risk is high in nations dominated by cereal-based diets (1). Around 25% of the global population is affected by anemia because of Fe deficiency (2), and the leading risk groups for this global public health concern are children 0-5 years of age, and pregnant and lactating women. Anemic complex due to severe iron deficiency leads to several lifethreatening diseases, namely, chronic kidney and heart failure, and inflammatory bowel disease (3).
Zinc is an essential element for a wide range of biochemical and immunological functions, and acute zinc deficiency leads to major health difficulties, such as altered growth and development, immunity, pregnancy, and neurobehavioral adversities (4). Estimates indicate that around 17% of the global population suffers from zinc deficiency-related diseases (5). Grain protein quantity and quality determine both the nutritional and endproduct quality of wheat. Lack of secondary immunity due to protein energy malnutrition (PEM) is one of the common causes of several infections in humans. Acute PEM in children is clinically defined as marasmus (chronic wasting) or kwashiorkor (edema and anemia) (6). Chronic PEM in children results in impaired cognitive development (7). Micronutrient malnutrition and PEM are leading risk factors for health loss in developing countries, with pregnant women and young children forming the most vulnerable groups (8).
Micronutrient and protein malnutrition can be overcome by consuming nutrient-rich diverse diet and/or by supplementation and fortification. However, the majority of populations in which the malnutrition problem is alarming may not be able to afford either of the two options, particularly the remote rural poor (9). Moreover, these interventions are not sustainable. Enhancing the nutritive levels of crop plants by conventional and molecular breeding approaches, termed as "biofortification, " has been recognized as a cost-effective and sustainable approach to reduce global protein and micronutrient malnutrition. Currently, the development of biofortified crop varieties in many countries has gained momentum, particularly after reaching self-sufficiency in food grains.
Grain mineral density depends on a plethora of physiological and biochemical processes, such as mineral absorption, translocation, redistribution, and remobilization to the sink, which makes micronutrient accumulation in grain a complex trait (10). Therefore, breeding programs need to be re-oriented to broaden the genetic base using wild relatives and landraces, and dissecting the genetic basis of these nutritional quality traits (11). Landraces are one of the most important sources of wheat biofortification with high levels of micronutrients (12).
Conventional breeding approaches have been successfully used to incorporate higher grain zinc content into elite breeding materials by crossing high-yielding elite wheat lines with A tauschii-based synthetic hexaploid wheats or Triticum spelta accessions (13). Substitution lines of the 6B chromosome obtained from Triticum dicoccoides are one of the most common genetic resources to improve zinc concentration in wheat (14). The Gpc-B1 locus mapped on the short arm of the 6B chromosome, derived from T dicoccoides, has a pleiotropic effect on zinc and iron in addition to grain protein (15). An NAC transcription factor (NAM-B1) encoded by Gpc-B1 is responsible for the increase in zinc as well as iron levels, possibly by stimulating leaf senescence, and thus remobilization of zinc and iron from flag leaves into seeds (16). Synthetic wheat derived from Ae. tauschii contains higher grain zinc and can act as a valuable genetic resource to increase the grain zinc levels of cultivated wheat (17).
Genetic dissection of complex nutritional traits is important for their improvement through marker-assisted selection (MAS). Identification of tightly linked molecular markers to the genomic regions governing the traits would help in the improvement of otherwise difficult to breed complex traits like protein and micronutrients. Reports have indicated significant effects of the environment and genotype-byenvironment interaction (GEI) in the expression of PC and TKW (18-23), iron, and zinc (19,24,25). Molecular mapping of polygenic traits by identifying quantitative trait loci (QTL) harboring genes for protein, micronutrient, and TKW would allow plant breeders to more efficiently develop biofortified cultivars.
Previous mapping of the same RIL population was carried out with 136 polymorphic SSR markers, which led to the identification of 16 QTL for four traits (55). The linkage map was coarse because of low marker frequency per chromosome ranging from 6 (1A and 2A) to 11 markers per chromosome (7B). Also, no QTL were mapped on the D genome because of low marker coverage. In this study, a 35K SNP chip was used for genotyping the RIL population, and a combined dataset of SSR and SNP markers was used to identify QTL for nutritional traits.

Plant Material
A set of 286 RILs from a cross between Indian bread wheat variety WH 542 and a synthetic derivative (T. dicoccon PI94624/Ae. tauschii [409]//BCN) received from CIMMYT (International Maize and Wheat Improvement Center), Mexico, was used in the earlier mapping study with SSR markers (55). A subset of 163 randomly selected RILs from this population was used for this investigation.

Field Trials and Phenotyping
The details of field experimentation, sample collection, and phenotyping have been described in detail in the earlier study (55). The phenotypic data for GFeC, GZnC, GPC, and TKW recorded for the earlier study were converted into the best linear unbiased predictors (BLUPs) and used in this study. Phenotypic correlations among traits, heritability, and ANOVA were conducted using the MetaRv6.0 (Multi Environment Trial Analysis with R) software. BLUPs of each RIL obtained for an individual year and combined across years were used further in QTL analysis. Phenotypic data of all the six environments are presented as Supplementary Table 1.

Genotyping
RILs and parental genomic DNA were extracted from the leaves of 21-day-old seedlings by following the CTAB method of Murray and Thompson (56).

Genotyping With Simple-Sequence Repeat Markers
A total of 714 SSR markers (57,58) were used for the parental polymorphism survey. These selected 714 SSRs cover all the chromosome arms of the bread wheat genome. Polymorphic markers and genotypic data are presented as Supplementary Table 2.

Linkage Analysis and Quantitative Trait Locus Mapping
Monomorphic markers between the two parents and markers with more than 30% missing data and minor allele frequency ≤5 and ≥95% were eliminated. Furthermore, markers that showed significant segregation distortion (p < 0.0001) from the expected Both linkage and QTL analysis were conducted with the IciMapping v4.2 software (http://www.isbreeding.net). The chromosome location of SNP inferred by BLAST of the sequences and previously mapped SSR markers (55) was used as the anchoring information. A LOD threshold of 3 was specified for grouping the markers. After all the markers were correctly grouped, they were ordered using the k-Optimality algorithm. Then, Rippling was done to fine-tune the ordered chromosomes in the linkage groups using a 5 cM window size. ICIM-ADD method was employed, which conducts inclusive composite interval mapping for identifying QTL. Missing phenotypic data were considered as deletion during QTL mapping and a relaxed threshold LOD score of 2.5 was specified for declaring significant QTL.

In silico Analysis of Quantitative Trait Loci
An in silico search of candidate genes was performed in the Ensemble Plants database (http://plants.ensembl.org/index.html) of the bread wheat genome with the Basic Local Alignment Search Tool (BLAST) using default parameters. The sequences of the markers present within the peak of the QTL and the flanking markers were used to conduct the search.

Variability, Heritability, and Trait Correlations
The heritability and variance components of GFeC, GZnC, GPC, and TKW in a RIL population are presented in Table 1.
Parents were contrasting for all the studied traits and P2 was superior over P1 with 43, 31, 26, and 23%, respectively, for GFeC, GZnC, GPC, and TKW. Environment-wise heritability ranged from 0.54 (GFeC at GBPUA&T_Y1) to 0.96 (TKW at Frontiers in Nutrition | www.frontiersin.org ICAR-IARI_Y2 and GBPUA&T_Y2) across the traits. The lowest pooled heritability was observed for GZnC (0.77), whereas, highest pooled heritability was recorded for TKW (0.91). Trait heritability corroborates the variance components; GZnC (6.87%) and TKW (4.57%) recorded the highest and lowest CV, respectively. The genotypic variance was highly significant for all the studied traits across the environments. The environmentwise pooled mean is also represented graphically in Figure 1. All the studied traits exhibited a near-normal distribution (Figure 2). Genetic correlation coefficients among GFeC, GZnC, GPC, and TKW are presented in Table 2. All the associations among the studied traits are positive and significant, except, between TKW and GPC in the Pusa Bihar_Y1 (r g = −0.03) and Pusa Bihar_Y2 environments (0.1) ( Table 2).

Quantitative Trait Locus Mapping
The total genetic length of the linkage map was 7,057 cM, and it contained 736 SNPs and 100 SSRs. The chromosome and genome-wise distribution of markers is presented in Table 3. The B genome had the highest number of mapped markers (361) followed by the A (265) and D genomes (210). Chromosome-wise distribution of the markers ranged between 17 (6A chromosome) to 81 (7A chromosome).
The mapped QTL across the locations and years are presented in Table 4, and the linkage map with the identified QTL position is depicted in Figure 3. A total of 21 QTL were identified in 13 chromosomes representing all three genomes of wheat. Two, five, and 14 QTL were mapped on the A, B, and D genomes, respectively. Chromosome 7D represented the maximum number of seven QTL. A total of 21 QTL were mapped between 16 flanked regions ( Table 4); the maximum number of four QTL was identified between flanking markers Xgwm350-AX-94958668, followed by three QTL between Xwmc550-Xgwm350 in the7D chromosome. Trait-wise highest QTL were identified for GPC (10 QTL), followed by GZnC (six QTL), GFeC (three QTL), and TKW (two QTL). QTL for GFeC were mapped in chromosomes 6D and 7D; for GZnC in chromosomes 3B, 1D, 2D, and 7D; for GPC in chromosomes 1A,7A, 5B, 6B, 3D, 4D, 5D, and 7D and for TKW in chromosomes 1B and 7D.

Quantitative Trait Locus Additive Effects
The additive effects of the stable QTL were investigated for GFeC, GZnC, and GPC (

DISCUSSION
Genetic biofortification is the most cost-effective and sustainable strategy to control malnutrition. Understanding the genetic basis of complex traits like micronutrients, protein, and thousand kernel weight by QTL mapping will help in devising appropriate breeding strategies through MAS. The expression of all the studied traits in this study is greatly affected by the environment and GEI. Similar results of greater magnitude of the environment and GEI have been reported in previous studies for PC and TKW (18-20) and also for iron and zinc (13,19,59). Among the studied traits, GZnC was the most variable, whereas, TKW was the most stable. The lowest and highest pooled heritability was observed for GZnC and TKW, respectively, and a reverse trend was observed for CV (GZnC: 6.87; TKW: 4.57). Although both location and year effects were visible for all the traits, the magnitude of the location effect was found to be more pronounced than the year effect (Figure 1). The positive and significant associations among GFeC, GZnC, GPC, and TKW found in this study have also been reported in earlier studies (27,29). In most of the earlier studies, the associations between GPC and TKW were negative. In this study, the associations between GPC and TKW were significantly positive in four out of six studied environments and non-significant negative in the Pusa Bihar_Y1 environment (r g = −0.03), and non-significant positive in the Pusa Bihar_Y2 environment (r g = 0.1). Similar results of both positive and negative associations between GPC and TKW have also been reported in some earlier studies (19,27,45,60). The lowest and highest pooled heritability of GZnC and TKW, respectively, is also congruent with earlier studies (28,44).
The linkage map was constructed with 836 high-quality informative markers (736 SNPs + 100 SSRs) and utilized for the QTL analysis. In the previous study conducted on the same population, the SSR-based genetic map had a very low frequency of markers in the D genome (55). As a result, none of the QTL is localized in the D genome. The addition of SNPs improved D genome marker density and distribution, particularly in the 7D chromosome. Enrichment of genetic linkage map with SNPs greatly helped in the mining of novel genomic regions in the D genome. As a result, a maximum number of novel QTL were also identified in the D genome (14 QTL).
The D genome generally shows a low level of polymorphism in naturally occurring hexaploid bread wheat due to its wellknown evolutionary history and low recombination during its post-evolution era (61,62). For this reason, synthetic hexaploid wheats (SHWs) were created by crossing tetraploid durum wheats with multiple accessions of Ae. tauschii (the D genome donor), which increased the diversity of the D genome (63)(64)(65). Studies have shown that the D genome diversity of SHW is considerably greater than that of bread wheat (66,67). In this study, since a synthetic parent was involved in the cross, D genome polymorphism improved significantly and 25.1% (210) of the markers mapped on the D genome ( Table 3). A similar trend in marker distribution has been observed in earlier studies involving SHWs as one of the parents in creating mapping populations (30).
In the earlier study, a total of 16 QTL were identified including four QTL for GFeC, five QTL for GZnC, two QTL for GPC, and five QTL for TKW. The QTL together explained 20, 32, 24.1, and 32.3% of the phenotypic variance, respectively, for GFeC, GZnC, GPC, and TKW. In contrast to the earlier study, where the D genome was completely missing, this study identified the majority of QTL from the D genome. This is due to fairly good marker coverage in all the three genomes in this study, unlike the earlier study, wherein, marker coverage in the D genome was very sparse. The total phenotypic variance explained for all the QTL for any given trait except TKW was higher in this study compared with the earlier studies. Similarly, for two traits, i.e., GFeC and TKW, the highest explained phenotypic variance for an individual QTL was higher compared with the earlier identified QTL. The highest explained phenotypic variance for an individual QTL was 42.13% for GFeC and 26.53% for TKW compared with the earlier identified QTL with 6.8 and 10.4%, respectively.
The environment and GEI play a key role in the expression of quantitative traits. Identification of stable genotypes with the high buffering ability and of QTL is of paramount importance to use in breeding programs. Genetic dissection of complex traits by the identification of stable QTL will complement varietal development by molecular breeding approaches. In this study, four stable QTL (QTkw.iari-7D, QGFe.iari-7D.2, QGFe.iari-7D.1, and QGPC.iari-7D.2) were identified in two or more environments. QTkw.iari-7D was identified in all the six tested environments and pooled mean, followed by QGFe.iari-7D.2 and QGFe.iari-7D.1, which were identified in three environments Peptidase C78, ubiquitin fold modifier-specific peptidase 1/2 along with pooled mean. Stable QTL identified in more than two environments were also reported for GPC and TKW (42,45), GPC (39,47,68), GFeC and GZnC (30), and GFeC (33,37). Identification of the best combination of QTL effects by estimation of additive effects of the stable QTL will provide an opportunity to utilize RILs with the best combination as donors. The best combination of QTL for all the three biofortification traits in RILs was identified. There is no additional advantage of additive QTL over the individual QTL effects in the expression of GFeC. However, the QTL combination of the two QTL combinations, viz., QGZn.iari-2D.1 and QGZn.iari-7D.1, showed the highest average GZnC across the environments, and this combination was identified in eight RILs. For GPC, the four QTL combinations, viz., QGPC.iari-2A, QGPC.iari-5B.1, QGPC.iari-7D.1, and QGPC.iari-7D.2 showed the highest average GPC across the environments, and this combination was identified in three RILs. Although numerically additive QTL effects for GZnC and GPC are higher than the individual QTL effect, statistically they are at par.
Genomic regions harboring co-located QTL for two or more traits were also identified. This information is helpful in the simultaneous improvement of multiple traits without many additional interventions. Two common genomic regions associated with different co-localized QTL governing two or more traits were identified in chromosome 7D where the genomic region flanked between Xgwm350-AX-94958668 was associated with the maximum number of four co-localized QTL (QGFe.iari-7D.2, QGZn.iari-7D.2, QGPC.iari-7D.2, and QTkw.iari-7D). Another region flanked between Xwmc550-Xgwm350 was also associated with three co-localized QTL (QGFe.iari-7D.1, QGZn.iari-7D.1, and QGPC.iari-7D.1). Some of the other studies (13,15,16,(37)(38)(39)(40) have also identified such pleiotropic region(s) associated with two or more traits, namely, GFeC, GZnC, GPC, and TKW. High positive correlations observed in this study also strongly support the co-localization of genomic regions governing GFeC, GZnC, GPC, and TKW. Only few studies have reported the association of TKW in the same region as GPC or even with GZnC and GFeC (42,44,45,68). All these studies reported a positive correlation for GPC and TKW. Considering the positive correlations obtained between TKW and GPC in all the environments, except Pusa Bihar_Y1, in this study, it was not surprising to find such pleiotropic QTL (in 2A and 7D chromosomes). The co-location of GFeC, GZnC, and GPC is well documented. For example, the Gpc-B1 locus derived from T dicoccoides is effective in improving GFeC, GZnC, and GPC by 18, 12, and 38%, respectively (15,16).
The in silico BLAST search identified various potential candidate genes underlying QTL with high PV or pleiotropic QTL for GZnC, GFeC, GPC, and TKW (Table 6). Various QTL identified in chromosomes 2A and 7D were located in regions where gene coding for transcription factors, transporters, and kinase-like superfamilies was present. For example, the SANT domain (coded by TraesCS2A02G063800) is generally found in combination with domains of Zn finger type transcription factors, such as the C2H2-type and GATA-type transcription factors, the role of which has been suggested to be in Zn uptake and homeostasis in plants (69,70). Members of serine-threonine/protein kinase-like superfamilies are known to catalyze phosphorylation processes, thus controlling growth and development, and some are known to activate Zn channels and transporters (71). The well-characterized serine/threonineprotein kinase encoding gene in maize (KNR6; kernel number per row: six) has been shown to determine the kernel number and ear length in maize (72). Since both maize and wheat are members of the Poaceae/Gramineae family, it would be interesting to further investigate the functional role of the serine/threonine-protein kinase genes identified here (coded by TraesCS7D02G521200 and TraesCS2A02G063900).
In the past decade, the role of various transporters has been shown in regulating mineral homeostasis in plants. These transporters play critical roles in the transport of small peptides, secondary amino acids, glutathione conjugates, and mineral uptake. Many of these transporters have proven to be involved in long-distance iron transport or signaling in Arabidopsis (73). In this regard, an important role of the aluminum-activated malate transporter (ALM1), in combination with a Zn fingertype transcription factor (STOP1), has been shown in regulating iron homeostasis in Arabidopsis (74). In both the stop1 and almt1 mutants, the accumulation of Fe in the root apex was found to be greatly reduced (75).

CONCLUSION
We earlier reported QTL for different biofortification traits, viz., grain zinc, iron, protein, and thousand kernel weight utilizing an SSR-based genetic map of 286 RIL population developed between a cultivated bread wheat variety and a synthetic derivative. In this study, we added 736 informative SNPs and analyzed a smaller subset of the same population for these traits. New QTL were identified in this study, and many of these were found located in the D genome. The co-localization of QTL for different traits was also observed. Chromosome 7D, in particular, harbored seven and three co-localized QTL at different positions. This indicates that at least some common pathways may be involved in the uptake or accumulation of the micronutrients. Several consistent QTL over two or more environments for different traits are identified in this study as well. Best QTL combinations in RILs have been identified through additive effects, and these combinations would be potential donors to be utilized in future breeding programs. Furthermore, the identification of pleiotropic regions for GZnC, GFeC, GPC, and TKW suggests the possibilities for genetic improvement of GZnC and GFeC without compromising grain yield and GPC. Further fine mapping to identify linked or functional markers is envisaged.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Materials.

AUTHOR CONTRIBUTIONS
AS conceptualized the investigation. Field experimentation at Delhi location was conducted by AS, AA, GS, and SaS at Bihar location by IS and RS, at Pantnagar by JJ and GK. Genotyping was done by SuS and GK. Statistical analysis including QTL mapping was done by NDR and DS. Original draft was prepared by GK, AS, and NDR. Review and editing was done by AS, DS, and GK. All authors contributed to the article and approved the submitted version.

FUNDING
This work was a part outcome of a DBT funded project (BT/AGR/Wheat Bioforti/PH-II/2010) granted to AS. High zinc parent received and calibration of the XRF machine with glass based standards by HarvestPlus is duly acknowledged. Training received under ICAR-World bank funded NAHEP-CAAST project by NDR to analyze the molecular data is also acknowledged. Author's also acknowledge IFPRI/HarvestPlus funding support through grant no. 2020H6458.IIW.